| cv.function {cv} | R Documentation |
Cross-Validate a Model-Selection Procedure
Description
The cv() "function" method
is a general function to cross-validate a model-selection procedure,
such as the following:
selectStepAIC() is a procedure that applies the stepAIC()
model-selection function in the MASS package; selectTrans() is a procedure
for selecting predictor and response transformations in regression, which
uses the powerTransform() function in the
car package; selectTransAndStepAIC() combines predictor and response
transformations with predictor selection; and selectModelList()
uses cross-validation to select a model from a list of models created by
models() and employs (recursive) cross-validation to assess the predictive
accuracy of this procedure.
Usage
## S3 method for class ''function''
cv(
model,
data,
criterion = mse,
k = 10L,
reps = 1L,
seed = NULL,
working.model = NULL,
y.expression = NULL,
confint = n >= 400L,
level = 0.95,
details = k <= 10L,
save.model = FALSE,
ncores = 1L,
...
)
selectStepAIC(
data,
indices,
model,
criterion = mse,
AIC = TRUE,
details = TRUE,
save.model = FALSE,
...
)
selectTrans(
data,
indices,
details = TRUE,
save.model = FALSE,
model,
criterion = mse,
predictors,
response,
family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
rounded = TRUE,
...
)
selectTransStepAIC(
data,
indices,
details = TRUE,
save.model = FALSE,
model,
criterion = mse,
predictors,
response,
family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
rounded = TRUE,
AIC = TRUE,
...
)
selectModelList(
data,
indices,
model,
criterion = mse,
k = 10L,
k.recurse = k,
details = k <= 10L,
save.model = FALSE,
seed = FALSE,
quietly = TRUE,
...
)
compareFolds(object, digits = 3, ...)
## S3 method for class 'cvSelect'
coef(object, average, NAs = 0, ...)
Arguments
model |
a regression model object fit to data, or for the
|
data |
full data frame for model selection. |
criterion |
a CV criterion ("cost" or lack-of-fit) function. |
k |
perform k-fold cross-validation (default is 10); |
reps |
number of times to replicate k-fold CV (default is |
seed |
for R's random number generator; not used for n-fold cross-validation.
If not explicitly set, a seed is randomly generated and saved to make the results
reproducible. In some cases, for internal use only, |
working.model |
a regression model object fit to data, typically
to begin a model-selection process; for use with |
y.expression |
normally the response variable is found from the
|
confint |
if |
level |
confidence level (default |
details |
if |
save.model |
save the model that's selected using the full data set
(default, |
ncores |
number of cores to use for parallel computations
(default is |
... |
for |
indices |
indices of cases in data defining the current fold. |
AIC |
if |
predictors |
character vector of names of the predictors in the model to transform; if missing, no predictors will be transformed. |
response |
name of the response variable; if missing, the response won't be transformed. |
family |
transformation family for the predictors, one of
|
family.y |
transformation family for the response,
with |
rounded |
if |
k.recurse |
the number of folds for recursive CV; defaults
to the value of |
quietly |
if |
object |
an object of class |
digits |
significant digits for printing coefficients
(default |
average |
if supplied, a function, such as |
NAs |
values to substitute for |
Details
The model-selection function supplied as the procedure (for cvSelect())
or model (for cv()) argument
should accept the following arguments:
dataset to the
dataargument tocvSelect()orcv().indicesthe indices of the rows of
datadefining the current fold; if missing, the model-selection procedure is applied to the fulldata.- other arguments
to be passed via
...fromcvSelect()orcv().
procedure() or model() should return a list with the following
named elements: fit.i, the vector of predicted values for the cases in
the current fold computed from the model omitting these cases;
crit.all.i, the CV criterion computed for all of the cases using
the model omitting the current fold; and (optionally) coefficients,
parameter estimates from the model computed omitting the current fold.
When the indices argument is missing, procedure() returns the cross-validation criterion for all of the cases based on
the model fit to all of the cases.
For examples of model-selection functions for the procedure
argument, see the code for selectStepAIC(),
selectTrans(), and selectTransAndStepAIC().
For additional information, see the "Cross-validating model selection"
vignette (vignette("cv-select", package="cv"))
and the "Extending the cv package" vignette
(vignette("cv-extend", package="cv")).
Value
An object of class "cvSelect",
inheriting from class "cv", with the CV criterion
("CV crit"), the bias-adjusted CV criterion ("adj CV crit"),
the criterion for the model applied to the full data ("full crit"),
the confidence interval and level for the bias-adjusted CV criterion ("confint"),
the number of folds ("k"), the seed for R's random-number
generator ("seed"), and (optionally) a list of coefficients
(or, in the case of selectTrans(), estimated transformation
parameters, and in the case of selectTransAndStepAIC(), both regression coefficients
and transformation parameters) for the selected models
for each fold ("coefficients").
If reps > 1, then an object of class c("cvSelectList", "cvList") is returned,
which is literally a list of c("cvSelect", "cv") objects.
Functions
-
cv(`function`):cv()method for applying a model model-selection (or specification) procedure. -
selectStepAIC(): select a regression model using thestepAIC()function in the MASS package. -
selectTrans(): select transformations of the predictors and response usingpowerTransform()in the car package. -
selectTransStepAIC(): select transformations of the predictors and response, and then select predictors. -
selectModelList(): select a model using (recursive) CV. -
compareFolds(): print the coefficients from the selected models for the several folds. -
coef(cvSelect): extract the coefficients from the selected models for the several folds and possibly average them.
See Also
stepAIC, bcPower,
powerTransform, cv.
Examples
data("Auto", package="ISLR2")
m.auto <- lm(mpg ~ . - name - origin, data=Auto)
cv(selectStepAIC, Auto, seed=123, working.model=m.auto)
cv(selectStepAIC, Auto, seed=123, working.model=m.auto,
AIC=FALSE, k=5, reps=3) # via BIC
data("Prestige", package="carData")
m.pres <- lm(prestige ~ income + education + women,
data=Prestige)
cvt <- cv(selectTrans, data=Prestige, working.model=m.pres, seed=123,
predictors=c("income", "education", "women"),
response="prestige", family="yjPower")
cvt
compareFolds(cvt)
coef(cvt, average=median, NAs=1) # NAs not really needed here
cv(m.pres, seed=123)
Auto$year <- as.factor(Auto$year)
Auto$origin <- factor(Auto$origin,
labels=c("America", "Europe", "Japan"))
rownames(Auto) <- make.names(Auto$name, unique=TRUE)
Auto$name <- NULL
m.auto <- lm(mpg ~ . , data=Auto)
cvs <- cv(selectTransStepAIC, data=Auto, seed=76692, working.model=m.auto,
criterion=medAbsErr,
predictors=c("cylinders", "displacement", "horsepower",
"weight", "acceleration"),
response="mpg", AIC=FALSE)
cvs
compareFolds(cvs)
data("Duncan", package="carData")
m1 <- lm(prestige ~ income + education, data=Duncan)
m2 <- lm(prestige ~ income + education + type, data=Duncan)
m3 <- lm(prestige ~ (income + education)*type, data=Duncan)
cv(selectModelList, data=Duncan, seed=5962,
working.model=models(m1, m2, m3)) # recursive CV