cv {cv} | R Documentation |
Cross-Validate Regression Models
Description
cv()
is a parallelized generic k-fold (including n-fold, i.e., leave-one-out)
cross-validation function, with a default method,
specific methods for linear and generalized-linear models that can be much
more computationally efficient, and a method for robust linear models.
There are also cv()
methods for mixed-effects models,
for model-selection procedures,
and for several models fit to the same data,
which are documented separately.
Usage
cv(model, data, criterion, k, reps = 1L, seed, ...)
## Default S3 method:
cv(
model,
data = insight::get_data(model),
criterion = mse,
k = 10L,
reps = 1L,
seed = NULL,
criterion.name = deparse(substitute(criterion)),
details = k <= 10L,
confint = n >= 400L,
level = 0.95,
ncores = 1L,
type = "response",
start = FALSE,
model.function,
...
)
## S3 method for class 'lm'
cv(
model,
data = insight::get_data(model),
criterion = mse,
k = 10L,
reps = 1L,
seed = NULL,
details = k <= 10L,
confint = n >= 400L,
level = 0.95,
method = c("auto", "hatvalues", "Woodbury", "naive"),
ncores = 1L,
...
)
## S3 method for class 'glm'
cv(
model,
data = insight::get_data(model),
criterion = mse,
k = 10L,
reps = 1L,
seed = NULL,
details = k <= 10L,
confint = n >= 400L,
level = 0.95,
method = c("exact", "hatvalues", "Woodbury"),
ncores = 1L,
start = FALSE,
...
)
## S3 method for class 'rlm'
cv(model, data, criterion, k, reps = 1L, seed, ...)
## S3 method for class 'cv'
print(x, digits = getOption("digits"), ...)
## S3 method for class 'cvList'
print(x, ...)
## S3 method for class 'cv'
as.data.frame(
x,
row.names = NULL,
optional = TRUE,
rows = c("cv", "folds"),
columns = c("criteria", "coefficients"),
...
)
## S3 method for class 'cvList'
as.data.frame(x, row.names = NULL, optional = TRUE, ...)
## S3 method for class 'cvDataFrame'
print(x, digits = getOption("digits") - 2L, ...)
## S3 method for class 'cvDataFrame'
summary(
object,
formula,
subset = NULL,
fun = mean,
include = c("cv", "folds", "all"),
...
)
Arguments
model |
a regression model object (see Details). |
data |
data frame to which the model was fit (not usually necessary). |
criterion |
cross-validation criterion ("cost" or lack-of-fit) function of form |
k |
perform k-fold cross-validation (default is |
reps |
number of times to replicate k-fold CV (default is |
seed |
for R's random number generator; optional, if not supplied a random seed will be selected and saved; not needed for n-fold cross-validation. |
... |
to match generic; passed to |
criterion.name |
a character string giving the name of the CV criterion function
in the returned |
details |
if |
confint |
if |
level |
confidence level (default |
ncores |
number of cores to use for parallel computations
(default is |
type |
for the default method, value to be passed to the
|
start |
if |
model.function |
a regression function, typically for a new |
method |
computational method to apply to a linear (i.e., |
x |
a |
digits |
significant digits for printing,
default taken from the |
row.names |
optional row names for the result,
defaults to |
optional |
to match the |
rows |
the rows of the resulting data frame to retain: setting
|
columns |
the columns of the resulting data frame to retain:
setting |
object |
an object inheriting from |
formula |
of the form |
subset |
a subsetting expression; the default ( |
fun |
summary function to apply, defaulting to |
include |
which rows of the |
Details
The default cv()
method uses update()
to refit the model
to each fold, and should work if there are appropriate update()
and predict()
methods, and if the default method for GetResponse()
works or if a GetResponse()
method is supplied.
The "lm"
and "glm"
methods can use much faster computational
algorithms, as selected by the method
argument. The linear-model
method accommodates weighted linear models.
For both classes of models, for the leave-one-out (n-fold) case, fitted values
for the folds can be computed from the hat-values via
method="hatvalues"
without refitting the model;
for GLMs, this method is approximate, for LMs it is exact.
Again for both classes of models, when more than one case is omitted
in each fold, fitted values may be obtained without refitting the
model by exploiting the Woodbury matrix identity via method="Woodbury"
.
As for hatvalues, this method is exact for LMs and approximate for
GLMs.
The default for linear models is method="auto"
,
which is equivalent to method="hatvalues"
for n-fold cross-validation
and method="Woodbury"
otherwise; method="naive"
refits
the model via update()
and is generally much slower. The
default for generalized linear models is method="exact"
,
which employs update()
. This default is conservative, and
it is usually safe to use method="hatvalues"
for n-fold CV
or method="Woodbury"
for k-fold CV.
There is also a method for robust linear models fit by
rlm()
in the MASS package (to avoid
inheriting the "lm"
method for which the default "auto"
computational method would be inappropriate).
For additional details, see the "Cross-validating regression models"
vignette (vignette("cv", package="cv")
).
cv()
is designed to be extensible to other classes of regression
models; see the "Extending the cv package" vignette
(vignette("cv-extend", package="cv")
).
Value
The cv()
methods return an object of class "cv"
, with the CV criterion
("CV crit"
), the bias-adjusted CV criterion ("adj CV crit"
),
the criterion for the model applied to the full data ("full crit"
),
the confidence interval and level for the bias-adjusted CV criterion ("confint"
),
the number of folds ("k"
), and the seed for R's random-number
generator ("seed"
). If details=TRUE
, then the returned object
will also include a "details"
component, which is a list of two
elements: "criterion"
, containing the CV criterion computed for the
cases in each fold; and "coefficients"
, regression coefficients computed
for the model with each fold deleted. Some methods may return a
subset of these components and may add additional information.
If reps
> 1
, then an object of class "cvList"
is returned,
which is literally a list of "cv"
objects.
Methods (by class)
-
cv(default)
:"default"
method. -
cv(lm)
:"lm"
method. -
cv(glm)
:"glm"
method. -
cv(rlm)
:"rlm"
method (to avoid inheriting the"lm"
method).
Methods (by generic)
-
print(cv)
:print()
method for"cv"
objects. -
as.data.frame(cv)
:as.data.frame()
method for"cv"
objects.
Functions
-
print(cvList)
:print()
method for"cvList"
objects. -
as.data.frame(cvList)
:as.data.frame()
method for"cvList"
objects. -
print(cvDataFrame)
:print()
method for"cvDataFrame"
objects. -
summary(cvDataFrame)
:summary()
method for"cvDataFrame"
objects.
See Also
cv.merMod
, cv.function
,
cv.modList
.
Examples
data("Auto", package="ISLR2")
m.auto <- lm(mpg ~ horsepower, data=Auto)
cv(m.auto, k="loo")
(cv.auto <- cv(m.auto, seed=1234))
compareFolds(cv.auto)
(cv.auto.reps <- cv(m.auto, seed=1234, reps=3))
D.auto.reps <- as.data.frame(cv.auto.reps)
head(D.auto.reps)
summary(D.auto.reps, mse ~ rep + fold, include="folds")
summary(D.auto.reps, mse ~ rep + fold, include = "folds",
subset = fold <= 5) # first 5 folds
summary(D.auto.reps, mse ~ rep, include="folds")
summary(D.auto.reps, mse ~ rep, fun=sd, include="folds")
data("Mroz", package="carData")
m.mroz <- glm(lfp ~ ., data=Mroz, family=binomial)
cv(m.mroz, criterion=BayesRule, seed=123)
data("Duncan", package="carData")
m.lm <- lm(prestige ~ income + education, data=Duncan)
m.rlm <- MASS::rlm(prestige ~ income + education,
data=Duncan)
cv(m.lm, k="loo", method="Woodbury")
cv(m.rlm, k="loo")