cv.function {cv} | R Documentation |
Cross-Validate a Model-Selection Procedure
Description
The cv()
"function"
method
is a general function to cross-validate a model-selection procedure,
such as the following:
selectStepAIC()
is a procedure that applies the stepAIC()
model-selection function in the MASS package; selectTrans()
is a procedure
for selecting predictor and response transformations in regression, which
uses the powerTransform()
function in the
car package; selectTransAndStepAIC()
combines predictor and response
transformations with predictor selection; and selectModelList()
uses cross-validation to select a model from a list of models created by
models()
and employs (recursive) cross-validation to assess the predictive
accuracy of this procedure.
Usage
## S3 method for class ''function''
cv(
model,
data,
criterion = mse,
k = 10L,
reps = 1L,
seed = NULL,
working.model = NULL,
y.expression = NULL,
confint = n >= 400L,
level = 0.95,
details = k <= 10L,
save.model = FALSE,
ncores = 1L,
...
)
selectStepAIC(
data,
indices,
model,
criterion = mse,
AIC = TRUE,
details = TRUE,
save.model = FALSE,
...
)
selectTrans(
data,
indices,
details = TRUE,
save.model = FALSE,
model,
criterion = mse,
predictors,
response,
family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
rounded = TRUE,
...
)
selectTransStepAIC(
data,
indices,
details = TRUE,
save.model = FALSE,
model,
criterion = mse,
predictors,
response,
family = c("bcPower", "bcnPower", "yjPower", "basicPower"),
family.y = c("bcPower", "bcnPower", "yjPower", "basicPower"),
rounded = TRUE,
AIC = TRUE,
...
)
selectModelList(
data,
indices,
model,
criterion = mse,
k = 10L,
k.recurse = k,
details = k <= 10L,
save.model = FALSE,
seed = FALSE,
quietly = TRUE,
...
)
compareFolds(object, digits = 3, ...)
## S3 method for class 'cvSelect'
coef(object, average, NAs = 0, ...)
Arguments
model |
a regression model object fit to data, or for the
|
data |
full data frame for model selection. |
criterion |
a CV criterion ("cost" or lack-of-fit) function. |
k |
perform k-fold cross-validation (default is 10); |
reps |
number of times to replicate k-fold CV (default is |
seed |
for R's random number generator; not used for n-fold cross-validation.
If not explicitly set, a seed is randomly generated and saved to make the results
reproducible. In some cases, for internal use only, |
working.model |
a regression model object fit to data, typically
to begin a model-selection process; for use with |
y.expression |
normally the response variable is found from the
|
confint |
if |
level |
confidence level (default |
details |
if |
save.model |
save the model that's selected using the full data set
(default, |
ncores |
number of cores to use for parallel computations
(default is |
... |
for |
indices |
indices of cases in data defining the current fold. |
AIC |
if |
predictors |
character vector of names of the predictors in the model to transform; if missing, no predictors will be transformed. |
response |
name of the response variable; if missing, the response won't be transformed. |
family |
transformation family for the predictors, one of
|
family.y |
transformation family for the response,
with |
rounded |
if |
k.recurse |
the number of folds for recursive CV; defaults
to the value of |
quietly |
if |
object |
an object of class |
digits |
significant digits for printing coefficients
(default |
average |
if supplied, a function, such as |
NAs |
values to substitute for |
Details
The model-selection function supplied as the procedure
(for cvSelect()
)
or model
(for cv()
) argument
should accept the following arguments:
data
set to the
data
argument tocvSelect()
orcv()
.indices
the indices of the rows of
data
defining the current fold; if missing, the model-selection procedure is applied to the fulldata
.- other arguments
to be passed via
...
fromcvSelect()
orcv()
.
procedure()
or model()
should return a list with the following
named elements: fit.i
, the vector of predicted values for the cases in
the current fold computed from the model omitting these cases;
crit.all.i
, the CV criterion computed for all of the cases using
the model omitting the current fold; and (optionally) coefficients
,
parameter estimates from the model computed omitting the current fold.
When the indices
argument is missing, procedure()
returns the cross-validation criterion for all of the cases based on
the model fit to all of the cases.
For examples of model-selection functions for the procedure
argument, see the code for selectStepAIC()
,
selectTrans()
, and selectTransAndStepAIC()
.
For additional information, see the "Cross-validating model selection"
vignette (vignette("cv-select", package="cv")
)
and the "Extending the cv package" vignette
(vignette("cv-extend", package="cv")
).
Value
An object of class "cvSelect"
,
inheriting from class "cv"
, with the CV criterion
("CV crit"
), the bias-adjusted CV criterion ("adj CV crit"
),
the criterion for the model applied to the full data ("full crit"
),
the confidence interval and level for the bias-adjusted CV criterion ("confint"
),
the number of folds ("k"
), the seed for R's random-number
generator ("seed"
), and (optionally) a list of coefficients
(or, in the case of selectTrans()
, estimated transformation
parameters, and in the case of selectTransAndStepAIC()
, both regression coefficients
and transformation parameters) for the selected models
for each fold ("coefficients"
).
If reps
> 1
, then an object of class c("cvSelectList", "cvList")
is returned,
which is literally a list of c("cvSelect", "cv")
objects.
Functions
-
cv(`function`)
:cv()
method for applying a model model-selection (or specification) procedure. -
selectStepAIC()
: select a regression model using thestepAIC()
function in the MASS package. -
selectTrans()
: select transformations of the predictors and response usingpowerTransform()
in the car package. -
selectTransStepAIC()
: select transformations of the predictors and response, and then select predictors. -
selectModelList()
: select a model using (recursive) CV. -
compareFolds()
: print the coefficients from the selected models for the several folds. -
coef(cvSelect)
: extract the coefficients from the selected models for the several folds and possibly average them.
See Also
stepAIC
, bcPower
,
powerTransform
, cv
.
Examples
data("Auto", package="ISLR2")
m.auto <- lm(mpg ~ . - name - origin, data=Auto)
cv(selectStepAIC, Auto, seed=123, working.model=m.auto)
cv(selectStepAIC, Auto, seed=123, working.model=m.auto,
AIC=FALSE, k=5, reps=3) # via BIC
data("Prestige", package="carData")
m.pres <- lm(prestige ~ income + education + women,
data=Prestige)
cvt <- cv(selectTrans, data=Prestige, working.model=m.pres, seed=123,
predictors=c("income", "education", "women"),
response="prestige", family="yjPower")
cvt
compareFolds(cvt)
coef(cvt, average=median, NAs=1) # NAs not really needed here
cv(m.pres, seed=123)
Auto$year <- as.factor(Auto$year)
Auto$origin <- factor(Auto$origin,
labels=c("America", "Europe", "Japan"))
rownames(Auto) <- make.names(Auto$name, unique=TRUE)
Auto$name <- NULL
m.auto <- lm(mpg ~ . , data=Auto)
cvs <- cv(selectTransStepAIC, data=Auto, seed=76692, working.model=m.auto,
criterion=medAbsErr,
predictors=c("cylinders", "displacement", "horsepower",
"weight", "acceleration"),
response="mpg", AIC=FALSE)
cvs
compareFolds(cvs)
data("Duncan", package="carData")
m1 <- lm(prestige ~ income + education, data=Duncan)
m2 <- lm(prestige ~ income + education + type, data=Duncan)
m3 <- lm(prestige ~ (income + education)*type, data=Duncan)
cv(selectModelList, data=Duncan, seed=5962,
working.model=models(m1, m2, m3)) # recursive CV