clean_and_combine {MissingHandle}R Documentation

Fill Missing Dates and Combine Data into a Data Frame

Description

Many times, you will not find data for all dates. After first January, 2011 you may have next data on 20th January, 2011 and so on. Also available dates may have zero values. Try to gather all such kinds of data in different excel sheets of a single excel file. Every sheet will contain two columns (1st one is dates and second one is the data). Load every sheet to separate elements of a list. Using this you can fill the gaps for all the sheets and mark all the corresponding values as zeros. Here I am talking about daily data. Finally, it will combine all the filled results into one data frame (first column is date and other columns will be corresponding values of your sheets) and give one csv file. Number of columns in the data frame will be number of sheets plus one.

Usage

clean_and_combine(
  my_list,
  starting_date = as.Date("2011-01-01"),
  ending_date = as.Date("2022-12-31"),
  date_format = "%d-%m-%y"
)

Arguments

my_list

List of elements containing two columns each. First column is data which may have missing dates and second column is corresponding time series values.

starting_date

From which date data is needed

ending_date

Upto which date data is needed

date_format

Specify the date format of your data

Value

References

Examples

# # real data
# # reading excel file into list ####
# file_path <- "excel_file.xlsx"
#
# # get sheet names
# sheet_names <- openxlsx::getSheetNames(file_path)
#
# # create an empty list to store the cleaned data frames
# my_list <- list()
#
# # loop through each sheet and apply the cleaning code
# for (sheet_name in sheet_names) {
#
#   column_types <- c('date', 'numeric')
#
#   date_format <- "%d-%m-%y"
#
#   # Read in the sheet as a data frame
#   data <- readxl::read_excel(file_path, sheet = sheet_name, col_types = column_types)
#
#   # add the cleaned data frame to the list
#   my_list[[sheet_name]] <- as.data.frame(data)
# }

# creating example ####

# 1st element ####
# Create a sequence of dates from "2011-01-01" to "2015-12-31"
dates <- seq(as.Date("2011-01-01"), as.Date("2011-03-31"), by="day")

# Generate random prices for each date
price_1 <- runif(length(dates), min=0, max=100)

# Combine the dates and prices into a data frame
df <- data.frame(Dates = dates, Price_a = price_1)

# Create a sequence of dates from "2016-02-01" to "2022-12-31"
dates2 <- seq(as.Date("2011-05-01"), as.Date("2011-12-31"), by="day")

# Generate random prices for each date
price_2 <- runif(length(dates2), min=0, max=100)

# Combine the dates and prices into a data frame
df2 <- data.frame(Dates = dates2, Price_a = price_2)

# Merge the two data frames row-wise
df <- rbind(df, df2)

# Create a sequence of dates from "2016-02-01" to "2022-12-31"
dates3 <- seq(as.Date("2012-02-01"), as.Date("2012-12-31"), by="day")

# Generate random prices for each date
price_3 <- runif(length(dates3), min=0, max=100)

# Combine the dates and prices into a data frame
df3 <- data.frame(Dates = dates3, Price_a = price_3)

# Merge the two data frames row-wise
df <- rbind(df, df3)

# Create a sequence of dates from "2016-02-01" to "2022-12-31"
dates4 <- seq(as.Date("2013-04-01"), as.Date("2022-12-31"), by="day")

# Generate random prices for each date
price_4 <- runif(length(dates4), min=0, max=100)

# Combine the dates and prices into a data frame
df4 <- data.frame(Dates = dates4, Price_a = price_4)

# Merge the two data frames row-wise
df <- rbind(df, df4)

# Specify column data types
df <- data.frame(Dates = as.Date(df$Dates),
                price_a = round(as.numeric(df$Price_a)))
# 2nd element ####
# Create a sequence of dates from "2011-01-01" to "2015-12-31"
dates <- seq(as.Date("2011-01-01"), as.Date("2011-05-31"), by="day")

# Generate random prices for each date
price_1 <- runif(length(dates), min=0, max=100)

# Combine the dates and prices into a data frame
df_second <- data.frame(Dates = dates, Price_b = price_1)

# Create a sequence of dates from "2016-02-01" to "2022-12-31"
dates2 <- seq(as.Date("2011-06-01"), as.Date("2011-10-31"), by="day")

# Generate random prices for each date
price_2 <- runif(length(dates2), min=0, max=100)

# Combine the dates and prices into a data frame
df_second2 <- data.frame(Dates = dates2, Price_b = price_2)

# Merge the two data frames row-wise
df_second <- rbind(df_second, df_second2)

# Create a sequence of dates from "2016-02-01" to "2022-12-31"
dates3 <- seq(as.Date("2012-01-01"), as.Date("2012-12-31"), by="day")

# Generate random prices for each date
price_3 <- runif(length(dates3), min=0, max=100)

# Combine the dates and prices into a data frame
df_second3 <- data.frame(Dates = dates3, Price_b = price_3)

# Merge the two data frames row-wise
df_second <- rbind(df_second, df_second3)

# Create a sequence of dates from "2016-02-01" to "2022-12-31"
dates4 <- seq(as.Date("2013-03-01"), as.Date("2022-12-31"), by="day")

# Generate random prices for each date
price_4 <- runif(length(dates4), min=0, max=100)

# Combine the dates and prices into a data frame
df_second4 <- data.frame(Dates = dates4, Price_b = price_4)

# Merge the two data frames row-wise
df_second <- rbind(df_second, df_second4)

# Specify column data types
df_second <- data.frame(Dates = as.Date(df_second$Dates),
                       price_b = round(as.numeric(df_second$Price_b)))
# my_list ####
# Create a list
my_list <- list()

# Add the data frame to the list
my_list$df <- df

my_list$df_second <- df_second

# getting output ####
my_combined_data <- clean_and_combine(my_list = my_list)
print(head(my_combined_data))

[Package MissingHandle version 0.1.1 Index]