process_MF {dateutils} | R Documentation |
Process mixed frequency
Description
Process mixed frequency data for nowcasting applications by identifying the missing observations in the contemporaneous data and replicating this pattern of missing observations in the historical data prior to aggregation. This allows the incorporation of all available information into the model while still using uniform frequency models to actually generate predictions, and can thus be applied to a wide array of econometrics and machine learning applications.
Usage
process_MF(
LHS,
RHS,
LHS_lags = 1,
RHS_lags = 1,
as_of = NULL,
frq = c("auto", "week", "month", "quarter", "year"),
date_name = "ref_date",
id_name = "series_name",
value_name = "value",
pub_date_name = "pub_date",
return_dt = TRUE
)
Arguments
LHS |
Left hand side data in long format. May include multiple LHS variables, but LHS variance MUST have the same frequency. |
RHS |
Right hand side data in long format at any frequency. |
LHS_lags |
Number of lags of LHS variables to include in output. |
RHS_lags |
Number of lags of RHS variables to include in output (may be 0, indicating contemporaneous values only). |
as_of |
Backtesting the model "as of" this date; requires that 'pub_date' is specified in the data |
frq |
Frequency of LHS data, one of 'week', 'month', 'quarter', 'year'. If not specified, the function will attempt to automatically identify the frequency. |
date_name |
Name of date column in data. |
id_name |
Name of ID column in the data. |
value_name |
Name of value column in the data. |
pub_date_name |
Name of publication date in the data. |
return_dt |
T/F, should the function return a 'data.table'? IF FALSE the function will return matrix data. |
Details
Right hand side data will always include observations contemporaneous with LHS data. Use 'RHS_lags' to add lags of RHS data to the output, and 'LHS_lags' to add lags of LHS data to the output. By default the function will return data in long format designed to be used with the 'dateutils' function 'process()'. Specifying 'return_dt = FALSE' will return LHS variables in the matrix 'Y', RHS variables in the matrix 'X', and corresponding dates (by index) in the date vector 'dates'.
Value
data.table in long format (unless ‘return_dt = FALSE'). Variables ending in ’0' are contemporaneous, ending in '1' are at one lag, '2' at two lags, etc.
Examples
LHS <- fred[series_name == "gdp constant prices"]
RHS <- fred[series_name != "gdp constant prices"]
dt <- process_MF(LHS, RHS)