create_lagged_df {forecastML} | R Documentation |
Create model training and forecasting datasets with lagged, grouped, dynamic, and static features
Description
Create a list of datasets with lagged, grouped, dynamic, and static features to (a) train forecasting models for specified forecast horizons and (b) forecast into the future with a trained ML model.
Usage
create_lagged_df(
data,
type = c("train", "forecast"),
method = c("direct", "multi_output"),
outcome_col = 1,
horizons,
lookback = NULL,
lookback_control = NULL,
dates = NULL,
frequency = NULL,
dynamic_features = NULL,
groups = NULL,
static_features = NULL,
predict_future = NULL,
use_future = FALSE,
keep_rows = FALSE
)
Arguments
data |
A data.frame with the (a) target to be forecasted and (b) features/predictors. An optional date column can be given in the
|
type |
The type of dataset to return–(a) model training or (b) forecast prediction. The default is |
method |
The type of modeling dataset to create. |
outcome_col |
The column index–an integer–of the target to be forecasted. If |
horizons |
A numeric vector of one or more forecast horizons, h, measured in dataset rows.
If |
lookback |
A numeric vector giving the lags–in dataset rows–for creating the lagged features. All non-grouping,
non-static, and non-dynamic features in the input dataset, |
lookback_control |
A list of numeric vectors, specifying potentially unique lags for each feature. The length
of the list should equal |
dates |
A vector or 1-column data.frame of dates/times with class 'Date' or 'POSIXt'. The length
of |
frequency |
Date/time frequency. Required if |
dynamic_features |
A character vector of column names that identify features that change through time but which are not lagged (e.g., weekday or year).
If |
groups |
A character vector of column names that identify the groups/hierarchies when multiple time series are present. These columns are used as model features but
are not lagged. Note that combining feature lags with grouped time series will result in |
static_features |
For grouped time series only. A character vector of column names that identify features that do not change through time.
These columns are not lagged. If |
predict_future |
When |
use_future |
Boolean. If |
keep_rows |
Boolean. For non-grouped time series, keep the |
Value
An S3 object of class 'lagged_df' or 'grouped_lagged_df': A list of data.frames with new columns for the lagged/non-lagged features.
For method = "direct"
, the length of the returned list is equal to the number of forecast horizons and is in the order of
horizons supplied to the horizons
argument. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_h
where 'h' gives the forecast horizon.
For method = "multi_output"
, the length of the returned list is 1. Horizon-specific datasets can be accessed with
my_lagged_df$horizon_1_3_5
where "1_3_5" represents the forecast horizons passed in horizons
.
The contents of the returned data.frames are as follows:
- type = 'train', non-grouped:
A data.frame with the outcome and lagged/dynamic features.
- type = 'train', grouped:
A data.frame with the outcome and unlagged grouping columns followed by lagged, dynamic, and static features.
- type = 'forecast', non-grouped:
(1) An 'index' column giving the row index or date of the forecast periods (e.g., a 100 row non-date-based training dataset would start with an index of 101). (2) A 'horizon' column that indicates the forecast period from
1:max(horizons)
. (3) Lagged features identical to the 'train', non-grouped dataset.- type = 'forecast', grouped:
(1) An 'index' column giving the date of the forecast periods. The first forecast date for each group is the maximum date from the
dates
argument + 1 *frequency
which is the user-supplied date/time frequency.(2) A 'horizon' column that indicates the forecast period from1:max(horizons)
. (3) Lagged, static, and dynamic features identical to the 'train', grouped dataset.
Attributes
-
names
: The horizon-specific datasets that can be accessed withmy_lagged_df$horizon_h
. -
type
: Training,train
, or forecasting,forecast
, dataset(s). -
method
:direct
ormulti_output
. -
horizons
: Forecast horizons measured in dataset rows. -
outcome_col
: The column index of the target being forecasted. -
outcome_cols
: Ifmethod = multi_output
, the column indices of the multiple outputs in the transformed dataset. -
outcome_name
: The name of the target being forecasted. -
outcome_names
: Ifmethod = multi_output
, the column names of the multiple outputs in the transformed dataset. The names take the form "outcome_name_h" where 'h' is a horizon passed inhorizons
. -
predictor_names
: The predictor or feature names from the input dataset. -
row_indices
: Therow.names()
of the output dataset. For non-grouped datasets, the firstlookback
+ 1 rows are removed from the beginning of the dataset to removeNA
values in the lagged features. -
date_indices
: Ifdates
are given, the vector ofdates
. -
frequency
: Ifdates
are given, the date/time frequency. -
data_start
:min(row_indices)
ormin(date_indices)
. -
data_stop
:max(row_indices)
ormax(date_indices)
. -
groups
: Ifgroups
are given, a vector of group names. -
class
: grouped_lagged_df, lagged_df, list
Methods and related functions
The output of create_lagged_df()
is passed into
and has the following generic S3 methods
Examples
# Sampled Seatbelts data from the R package datasets.
data("data_seatbelts", package = "forecastML")
#------------------------------------------------------------------------------
# Example 1 - Training data for 2 horizon-specific models w/ common lags per predictor.
horizons <- c(1, 12)
lookback <- 1:15
data <- data_seatbelts
data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
horizons = horizons, lookback = lookback)
head(data_train[[length(horizons)]])
# Example 1 - Forecasting dataset
# The last 'nrow(data_seatbelts) - horizon' rows are automatically used from data_seatbelts.
data_forecast <- create_lagged_df(data_seatbelts, type = "forecast", outcome_col = 1,
horizons = horizons, lookback = lookback)
head(data_forecast[[length(horizons)]])
#------------------------------------------------------------------------------
# Example 2 - Training data for one 3-month horizon model w/ unique lags per predictor.
horizons <- 3
lookback <- list(c(3, 6, 9, 12), c(4:12), c(6:15), c(8))
data_train <- create_lagged_df(data_seatbelts, type = "train", outcome_col = 1,
horizons = horizons, lookback_control = lookback)
head(data_train[[length(horizons)]])