shortstacking {ddml} | R Documentation |
Predictions using Short-Stacking.
Description
Predictions using short-stacking.
Usage
shortstacking(
y,
X,
Z = NULL,
learners,
sample_folds = 2,
ensemble_type = "average",
custom_ensemble_weights = NULL,
compute_insample_predictions = FALSE,
subsamples = NULL,
silent = FALSE,
progress = NULL,
auxilliary_X = NULL,
shortstack_y = y
)
Arguments
y |
The outcome variable. |
X |
A (sparse) matrix of predictive variables. |
Z |
Optional additional (sparse) matrix of predictive variables. |
learners |
May take one of two forms, depending on whether a single
learner or stacking with multiple learners is used for estimation of the
predictor.
If a single learner is used,
If stacking with multiple learners is used,
Omission of the |
sample_folds |
Number of cross-fitting folds. |
ensemble_type |
Ensemble method to combine base learners into final estimate of the conditional expectation functions. Possible values are:
Multiple ensemble types may be passed as a vector of strings. |
custom_ensemble_weights |
A numerical matrix with user-specified
ensemble weights. Each column corresponds to a custom ensemble
specification, each row corresponds to a base learner in |
compute_insample_predictions |
Indicator equal to 1 if in-sample predictions should also be computed. |
subsamples |
List of vectors with sample indices for cross-fitting. |
silent |
Boolean to silence estimation updates. |
progress |
String to print before learner and cv fold progress. |
auxilliary_X |
An optional list of matrices of length
|
shortstack_y |
Optional vector of the outcome variable to form
short-stacking predictions for. Base learners are always trained on
|
Value
shortstack
returns a list containing the following components:
oos_fitted
A matrix of out-of-sample predictions, each column corresponding to an ensemble type (in chronological order).
weights
An array, providing the weight assigned to each base learner (in chronological order) by the ensemble procedures.
is_fitted
When
compute_insample_predictions = T
. a list of matrices with in-sample predictions by sample fold.auxilliary_fitted
When
auxilliary_X
is notNULL
, a list of matrices with additional predictions.oos_fitted_bylearner
A matrix of out-of-sample predictions, each column corresponding to a base learner (in chronological order).
is_fitted_bylearner
When
compute_insample_predictions = T
, a list of matrices with in-sample predictions by sample fold.auxilliary_fitted_bylearner
When
auxilliary_X
is notNULL
, a list of matrices with additional predictions for each learner.
Note that unlike crosspred
, shortstack
always computes
out-of-sample predictions for each base learner (at no additional
computational cost).
References
Ahrens A, Hansen C B, Schaffer M E, Wiemann T (2023). "ddml: Double/debiased machine learning in Stata." https://arxiv.org/abs/2301.09397
Wolpert D H (1992). "Stacked generalization." Neural Networks, 5(2), 241-259.
See Also
Other utilities:
crosspred()
,
crossval()
Examples
# Construct variables from the included Angrist & Evans (1998) data
y = AE98[, "worked"]
X = AE98[, c("morekids", "age","agefst","black","hisp","othrace","educ")]
# Compute predictions using shortstacking with base learners ols and lasso.
# Two stacking approaches are simultaneously computed: Equally
# weighted (ensemble_type = "average") and MSPE-minimizing with weights
# in the unit simplex (ensemble_type = "nnls1"). Predictions for each
# learner are also calculated.
shortstack_res <- shortstacking(y, X,
learners = list(list(fun = ols),
list(fun = mdl_glmnet)),
ensemble_type = c("average",
"nnls1",
"singlebest"),
sample_folds = 2,
silent = TRUE)
dim(shortstack_res$oos_fitted) # = length(y) by length(ensemble_type)
dim(shortstack_res$oos_fitted_bylearner) # = length(y) by length(learners)