lmave {ecm} | R Documentation |
Build multiple lm models and average them
Description
Builds k lm models on k partitions of the data and averages their coefficients to get create one model. Each partition excludes k/nrow(data) observations. See links in the References section for further details on this methodology.
Usage
lmave(formula, data, k, method = "boot", seed = 5, weights = NULL, ...)
Arguments
formula |
The formula to be passed to lm |
data |
The data to be used |
k |
The number of models or data partitions desired |
method |
Whether to split data by folds ("fold"), nested folds ("nestedfold"), or bootstrapping ("boot") |
seed |
Seed for reproducibility (only needed if method is "boot") |
weights |
Optional vector of weights to be passed to the fitting process |
... |
Additional arguments to be passed to the 'lm' function |
Details
In some cases–especially in some time series modeling (see ecmave function)–rather than building one model on the entire dataset, it may be preferable to build multiple models on subsets of the data and average them. The lmave function splits the data into k partitions of size (k-1)/k*nrow(data), builds k models, and then averages the coefficients of these models to get a final model. This is similar to averaging multiple tree regression models in algorithms like random forest.
Unlike the 'ecm' functin, this function only works with the 'lm' linear fitter.
Value
an lm object
References
Jung, Y. & Hu, J. (2016). "A K-fold Averaging Cross-validation Procedure". https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5019184/
Cochrane, C. (2018). "Time Series Nested Cross-Validation". https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9
See Also
lm
Examples
##Not run
#Build linear models to predict Wilshire 5000 index based on corporate profits,
#Federal Reserve funds rate, and unemployment rate
data(Wilshire)
#Build one model on the entire dataset
modelall <- lm(Wilshire5000 ~ ., data = Wilshire[-1])
#Build a five fold averaged linear model on the entire dataset
modelave <- lmave('Wilshire5000 ~ .', data = Wilshire[-1], k = 5)