R: Format Raw Data for Forecast Combination

foreccomb {ForecastComb}

R Documentation

Format Raw Data for Forecast Combination

Description

Structures cross-sectional input data (individual model forecasts) for forecast combination. Stores data as S3 class foreccomb that serves as input to the forecast combination techniques. Handles missing value imputation (optional) and resolves problems due to perfect collinearity.

Usage

foreccomb(observed_vector, prediction_matrix, newobs = NULL,
  newpreds = NULL, byrow = FALSE, na.impute = TRUE, criterion = "RMSE")

Arguments

`observed_vector`	A vector or univariate time series; contains ‘actual values’ for training set.
`prediction_matrix`	A matrix or multivariate time series; contains individual model forecasts for training set.
`newobs`	A vector or univariate time series; contains ‘actual values’ if a test set is used (optional).
`newpreds`	A matrix or multivariate time series; contains individual model forecasts if a test set is used (optional). Does not require specification of `newobs` – in the case in which a forecaster only wants to train the forecast combination method with a training set and apply it to future individual model forecasts, only `newpreds` is required, not `newobs`.
`byrow`	logical. The default (`FALSE`) assumes that each column of the forecast matrices (`prediction_matrix` and – if specified – `newpreds`) contains forecasts from one forecast model; if each row of the matrices contains forecasts from one forecast model, set to `TRUE`.
`na.impute`	logical. The default (`TRUE`) behavior is to impute missing values via the cross-validated spline approach of the `mtsdi` package. If set to `FALSE`, forecasts with missing values will be removed. Missing values in the observed data are never imputed.
`criterion`	One of `"RMSE"` (default), `"MAE"`, or `"MAPE"`. Is only used if `prediction_matrix` is not full rank: The algorithm checks which models are causing perfect collinearity and the one with the worst individual accuracy (according to the chosen criterion) is removed.

Details

The function imports the column names of the prediction matrix (if byrow = FALSE, otherwise the row names) as model names; if no column names are specified, generic model names are created.

The missing value imputation algorithm is a modified version of the EM algorithm for imputation that is applicable to time series data - accounting for correlation between the forecasting models and time structure of the series itself. A smooth spline is fitted to each of the time series at each iteration. The degrees of freedom of each spline are chosen by cross-validation.

Forecast combination relies on the lack of perfect collinearity. The test for this condition checks if prediction_matrix is full rank. In the presence of perfect collinearity, the iterative algorithm identifies the subset of forecasting models that are causing linear dependence and removes the one among them that has the lowest accuracy (according to a selected criterion, default is RMSE). This procedure is repeated until the revised prediction matrix is full rank.

Value

Returns an object of class foreccomb.

Author(s)

Christoph E. Weiss, Gernot R. Roetzer

References

Junger, W. L., Ponce de Leon, A., and Santos, N. (2003). Missing Data Imputation in Multivariate Time Series via EM Algorithm. Cadernos do IME, 15, 8–21.

Dempster, A., Laird, N., and Rubin D. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.

Examples

obs <- rnorm(100)
preds <- matrix(rnorm(1000, 1), 100, 10)
train_o<-obs[1:80]
train_p<-preds[1:80,]
test_o<-obs[81:100]
test_p<-preds[81:100,]

## Example with a training set only:
foreccomb(train_o, train_p)

## Example with a training set and future individual forecasts:
foreccomb(train_o, train_p, newpreds=test_p)

## Example with a training set and a full test set:
foreccomb(train_o, train_p, test_o, test_p)

## Example with forecast models being stored in rows:
preds_row <- matrix(rnorm(1000, 1), 10, 100)
train_p_row <- preds_row[,1:80]
foreccomb(train_o, train_p_row, byrow = TRUE)

## Example with NA imputation:
train_p_na <- train_p
train_p_na[2,3] <- NA
foreccomb(train_o, train_p_na, na.impute = TRUE)

## Example with perfect collinearity:
train_p[,2] <- 0.8*train_p[,1] + 0.4*train_p[,8]
foreccomb(train_o, train_p, criterion="RMSE")

[Package ForecastComb version 1.3.1 Index]