| create_lagged_df {forecastML} | R Documentation |
Create model training and forecasting datasets with lagged, grouped, dynamic, and static features
Description
Create a list of datasets with lagged, grouped, dynamic, and static features to (a) train forecasting models for specified forecast horizons and (b) forecast into the future with a trained ML model.
Usage
create_lagged_df(
data,
type = c("train", "forecast"),
method = c("direct", "multi_output"),
outcome_col = 1,
horizons,
lookback = NULL,
lookback_control = NULL,
dates = NULL,
frequency = NULL,
dynamic_features = NULL,
groups = NULL,
static_features = NULL,
predict_future = NULL,
use_future = FALSE,
keep_rows = FALSE
)
Arguments
data |
A data.frame with the (a) target to be forecasted and (b) features/predictors. An optional date column can be given in the
|
type |
The type of dataset to return–(a) model training or (b) forecast prediction. The default is |
method |
The type of modeling dataset to create. |
outcome_col |
The column index–an integer–of the target to be forecasted. If |
horizons |
A numeric vector of one or more forecast horizons, h, measured in dataset rows.
If |
lookback |
A numeric vector giving the lags–in dataset rows–for creating the lagged features. All non-grouping,
non-static, and non-dynamic features in the input dataset, |
lookback_control |
A list of numeric vectors, specifying potentially unique lags for each feature. The length
of the list should equal |
dates |
A vector or 1-column data.frame of dates/times with class 'Date' or 'POSIXt'. The length
of |
frequency |
Date/time frequency. Required if |
dynamic_features |
A character vector of column names that identify features that change through time but which are not lagged (e.g., weekday or year).
If |
groups |
A character vector of column names that identify the groups/hierarchies when multiple time series are present. These columns are used as model features but
are not lagged. Note that combining feature lags with grouped time series will result in |
static_features |
For grouped time series only. A character vector of column names that identify features that do not change through time.
These columns are not lagged. If |
predict_future |
When |
use_future |
Boolean. If |
keep_rows |
Boolean. For non-grouped time series, keep the |
Value
An S3 object of class 'lagged_df' or 'grouped_lagged_df': A list of data.frames with new columns for the lagged/non-lagged features.
For method = "direct", the length of the returned list is equal to the number of forecast horizons and is in the order of
horizons supplied to the horizons argument. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_h where 'h' gives the forecast horizon.
For method = "multi_output", the length of the returned list is 1. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_1_3_5 where "1_3_5" represents the forecast horizons passed in horizons.
The contents of the returned data.frames are as follows:
- type = 'train', non-grouped:
A data.frame with the outcome and lagged/dynamic features.
- type = 'train', grouped:
A data.frame with the outcome and unlagged grouping columns followed by lagged, dynamic, and static features.
- type = 'forecast', non-grouped:
(1) An 'index' column giving the row index or date of the forecast periods (e.g., a 100 row non-date-based training dataset would start with an index of 101). (2) A 'horizon' column that indicates the forecast period from
1:max(horizons). (3) Lagged features identical to the 'train', non-grouped dataset.- type = 'forecast', grouped:
(1) An 'index' column giving the date of the forecast periods. The first forecast date for each group is the maximum date from the
datesargument + 1 *frequencywhich is the user-supplied date/time frequency.(2) A 'horizon' column that indicates the forecast period from1:max(horizons). (3) Lagged, static, and dynamic features identical to the 'train', grouped dataset.
Attributes
-
names: The horizon-specific datasets that can be accessed withmy_lagged_df$horizon_h. -
type: Training,train, or forecasting,forecast, dataset(s). -
method:directormulti_output. -
horizons: Forecast horizons measured in dataset rows. -
outcome_col: The column index of the target being forecasted. -
outcome_cols: Ifmethod = multi_output, the column indices of the multiple outputs in the transformed dataset. -
outcome_name: The name of the target being forecasted. -
outcome_names: Ifmethod = multi_output, the column names of the multiple outputs in the transformed dataset. The names take the form "outcome_name_h" where 'h' is a horizon passed inhorizons. -
predictor_names: The predictor or feature names from the input dataset. -
row_indices: Therow.names()of the output dataset. For non-grouped datasets, the firstlookback+ 1 rows are removed from the beginning of the dataset to removeNAvalues in the lagged features. -
date_indices: Ifdatesare given, the vector ofdates. -
frequency: Ifdatesare given, the date/time frequency. -
data_start:min(row_indices)ormin(date_indices). -
data_stop:max(row_indices)ormax(date_indices). -
groups: Ifgroupsare given, a vector of group names. -
class: grouped_lagged_df, lagged_df, list
Methods and related functions
The output of create_lagged_df() is passed into
and has the following generic S3 methods
Examples
# Sampled Seatbelts data from the R package datasets.
data("data_seatbelts", package = "forecastML")
#------------------------------------------------------------------------------
# Example 1 - Training data for 2 horizon-specific models w/ common lags per predictor.
horizons <- c(1, 12)
lookback <- 1:15
data <- data_seatbelts
data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
horizons = horizons, lookback = lookback)
head(data_train[[length(horizons)]])
# Example 1 - Forecasting dataset
# The last 'nrow(data_seatbelts) - horizon' rows are automatically used from data_seatbelts.
data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1,
horizons = horizons, lookback = lookback)
head(data_forecast[[length(horizons)]])
#------------------------------------------------------------------------------
# Example 2 - Training data for one 3-month horizon model w/ unique lags per predictor.
horizons <- 3
lookback <- list(c(3, 6, 9, 12), c(4:12), c(6:15), c(8))
data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
horizons = horizons, lookback_control = lookback)
head(data_train[[length(horizons)]])