R: Build multiple lm models and average them

lmave {ecm}

R Documentation

Build multiple lm models and average them

Description

Builds k lm models on k partitions of the data and averages their coefficients to get create one model. Each partition excludes k/nrow(data) observations. See links in the References section for further details on this methodology.

Usage

lmave(formula, data, k, method = "boot", seed = 5, weights = NULL, ...)

Arguments

`formula`	The formula to be passed to lm
`data`	The data to be used
`k`	The number of models or data partitions desired
`method`	Whether to split data by folds ("fold"), nested folds ("nestedfold"), or bootstrapping ("boot")
`seed`	Seed for reproducibility (only needed if method is "boot")
`weights`	Optional vector of weights to be passed to the fitting process
`...`	Additional arguments to be passed to the 'lm' function

Details

In some cases–especially in some time series modeling (see ecmave function)–rather than building one model on the entire dataset, it may be preferable to build multiple models on subsets of the data and average them. The lmave function splits the data into k partitions of size (k-1)/k*nrow(data), builds k models, and then averages the coefficients of these models to get a final model. This is similar to averaging multiple tree regression models in algorithms like random forest.

Unlike the 'ecm' functin, this function only works with the 'lm' linear fitter.

Value

an lm object

References

Jung, Y. & Hu, J. (2016). "A K-fold Averaging Cross-validation Procedure". https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5019184/

Cochrane, C. (2018). "Time Series Nested Cross-Validation". https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9

Examples

##Not run

#Build linear models to predict Wilshire 5000 index based on corporate profits, 
#Federal Reserve funds rate, and unemployment rate
data(Wilshire)

#Build one model on the entire dataset
modelall <- lm(Wilshire5000 ~ ., data = Wilshire[-1])

#Build a five fold averaged linear model on the entire dataset
modelave <- lmave('Wilshire5000 ~ .', data = Wilshire[-1], k = 5)

[Package ecm version 7.2.0 Index]