R: Stacking model weights

stackingWeights {MuMIn}

R Documentation

Stacking model weights

Description

Compute model weights based on a cross-validation-like procedure.

Usage

stackingWeights(object, ..., data, R, p = 0.5)

Arguments

`object`, `...`	two or more fitted `glm` objects, or a `list` of such, or an `"averaging"` object.
`data`	a data frame containing the variables in the model, used for fitting and prediction.
`R`	the number of replicates.
`p`	the proportion of the `data` to be used as training set. Defaults to 0.5.

Details

Each model in a set is fitted to the training data: a subset of p * N observations in data. From these models a prediction is produced on the remaining part of data (the test or hold-out data). These hold-out predictions are fitted to the hold-out observations, by optimising the weights by which the models are combined. This process is repeated R times, yielding a distribution of weights for each model (which Smyth & Wolpert (1998) referred to as an ‘empirical Bayesian estimate of posterior model probability’). A mean or median of model weights for each model is taken and re-scaled to sum to one.

Value

A matrix with two rows, containing model weights calculated using mean and median.

Note

This approach requires a sample size of at least 2\times the number of models.

Author(s)

Carsten Dormann, Kamil Bartoń

References

Wolpert, D. H. 1992 Stacked generalization. Neural Networks 5, 241–259.

Smyth, P. and Wolpert, D. 1998 An Evaluation of Linearly Combining Density Estimators via Stacking. Technical Report No. 98–25. Information and Computer Science Department, University of California, Irvine, CA.

Dormann, C. et al. 2018 Model averaging in ecology: a review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecological Monographs 88, 485–504.

Examples

#simulated Cement dataset to increase sample size for the training data 
fm0 <- glm(y ~ X1 + X2 + X3 + X4, data = Cement, na.action = na.fail)
dat <- as.data.frame(apply(Cement[, -1], 2, sample, 50, replace = TRUE))
dat$y <- rnorm(nrow(dat), predict(fm0), sigma(fm0))

# global model fitted to training data:
fm <- glm(y ~ X1 + X2 + X3 + X4, data = dat, na.action = na.fail)

# generate a list of *some* subsets of the global model
models <- lapply(dredge(fm, evaluate = FALSE, fixed = "X1", m.lim = c(1, 3)), eval)

wts <- stackingWeights(models, data = dat, R = 10)

ma <- model.avg(models)
Weights(ma) <- wts["mean", ]

predict(ma)

[Package MuMIn version 1.48.4 Index]