buildModelSeries {pedometrics} | R Documentation |
Build a series of linear models using automated variable selection
Description
Build a series of linear models with stats::lm()
using one or more automated variable
selection methods implemented in the functions stepVIF()
and MASS::stepAIC()
.
Usage
buildModelSeries(
formula,
data,
vif = FALSE,
vif.threshold = 10,
vif.verbose = FALSE,
aic = FALSE,
aic.direction = "both",
aic.trace = FALSE,
aic.steps = 5000,
...
)
buildMS(
formula,
data,
vif = FALSE,
vif.threshold = 10,
vif.verbose = FALSE,
aic = FALSE,
aic.direction = "both",
aic.trace = FALSE,
aic.steps = 5000,
...
)
Arguments
formula |
A list containing one or several model formulas (a symbolic description of the model to be fitted). |
data |
Data frame containing the variables in the model formulas. |
vif |
Logical for performing backward variable selection using the Variance-Inflation Factor
(VIF). Defaults to |
vif.threshold |
Numeric value setting the maximum acceptable VIF value. Defaults to
|
vif.verbose |
Logical for printing iteration results of backward variable selection using
the VIF. Defaults to |
aic |
Logical for performing variable selection using Akaike's Information Criterion (AIC).
Defaults to |
aic.direction |
Character string setting the direction of variable selection when using AIC,
with options |
aic.trace |
Logical for printing iteration results of variable selection using the AIC.
Defaults to |
aic.steps |
Integer value setting the maximum number of steps to be considered for variable
selection using the AIC. Defaults to |
... |
Further arguments passed to |
Details
buildModelSeries()
was devised to deal with a list of linear model formulas. The
main objective is to bring together several functions commonly used when building linear models,
such as automated variable selection. In the current implementation, variable selection can be
done using stepVIF()
or MASS::stepAIC()
or both.
stepVIF()
is a backward variable selection procedure, while MASS::stepAIC()
supports backward, forward, and bidirectional variable selection. For more information about
these functions, please visit their respective help pages.
An important feature of buildModelSeries()
is that it records the initial number
of candidate predictor variables and observations offered to the model, and adds this information
as an attribute to the final selected model. Such feature was included because variable selection
procedures result biased linear models (too optimistic), and the effective number of degrees of
freedom is close to the number of candidate predictor variables initially offered to the model
(Harrell, 2001). With the initial number of candidate predictor variables and observations
offered to the model, one can calculate penalized or adjusted measures of model performance. For
models built using buildModelSeries()
, this can be done using
statsModelSeries()
.
Some important details should be clear when using buildModelSeries()
:
this function was originally devised to deal with a list of formulas, but can also be used with a single formula;
in the current implementation,
stepVIF()
runs beforeMASS::stepAIC()
;function arguments imported from
MASS::stepAIC()
andstepVIF()
were named as in the original functions, and received a prefix (aic
orvif
) to help the user identifying which function is affected by a given argument without having to go check the documentation.
Value
A list containing the fitted linear models.
TODO
Add option to set the order in which MASS::stepAIC()
and stepVIF()
are run.
Dependencies
The MASS package, provider of support functions and datasets for Venables and Ripley's Modern
Applied Statistics with S, is required for buildModelSeries()
to work. The
development version of the MASS package is available on
https://www.stats.ox.ac.uk/pub/MASS4/ while its old versions are available on the CRAN archive
at https://cran.r-project.org/src/contrib/Archive/MASS/.
Author(s)
Alessandro Samuel-Rosa alessandrosamuelrosa@gmail.com
References
Harrell, F. E. (2001) Regression modelling strategies: with applications to linear models, logistic regression, and survival analysis. First edition. New York: Springer.
Venables, W. N. and Ripley, B. D. (2002) Modern applied statistics with S. Fourth edition. New York: Springer.
A. Samuel-Rosa, G. B. M. Heuvelink, G. de Mattos Vasques, and L. H. C. dos Anjos, Do more detailed environmental covariates deliver more accurate soil maps?, Geoderma, vol. 243–244, pp. 214–227, May 2015, doi: 10.1016/j.geoderma.2014.12.017.
See Also
Examples
if (interactive()) {
# based on the second example of MASS::stepAIC()
library("MASS")
cpus1 <- cpus
for(v in names(cpus)[2:7])
cpus1[[v]] <- cut(cpus[[v]], unique(stats::quantile(cpus[[v]])),
include.lowest = TRUE)
cpus0 <- cpus1[, 2:8] # excludes names, authors' predictions
cpus.samp <- sample(1:209, 100)
cpus.form <- list(formula(log10(perf) ~ syct + mmin + mmax + cach + chmin +
chmax + perf),
formula(log10(perf) ~ syct + mmin + cach + chmin + chmax),
formula(log10(perf) ~ mmax + cach + chmin + chmax + perf))
data <- cpus1[cpus.samp,2:8]
cpus.ms <- buildModelSeries(cpus.form, data, vif = TRUE, aic = TRUE)
}