repCV {cvTools} | R Documentation |
Cross-validation for linear models
Description
Estimate the prediction error of a linear model via (repeated) K
-fold
cross-validation. Cross-validation functions are available for least
squares fits computed with lm
as well as for the
following robust alternatives: MM-type models computed with
lmrob
and least trimmed squares fits computed with
ltsReg
.
Usage
repCV(object, ...)
## S3 method for class 'lm'
repCV(
object,
cost = rmspe,
K = 5,
R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL,
folds = NULL,
seed = NULL,
...
)
## S3 method for class 'lmrob'
repCV(
object,
cost = rtmspe,
K = 5,
R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL,
folds = NULL,
seed = NULL,
...
)
## S3 method for class 'lts'
repCV(
object,
cost = rtmspe,
K = 5,
R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL,
folds = NULL,
fit = c("reweighted", "raw", "both"),
seed = NULL,
...
)
cvLm(
object,
cost = rmspe,
K = 5,
R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL,
folds = NULL,
seed = NULL,
...
)
cvLmrob(
object,
cost = rtmspe,
K = 5,
R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL,
folds = NULL,
seed = NULL,
...
)
cvLts(
object,
cost = rtmspe,
K = 5,
R = 1,
foldType = c("random", "consecutive", "interleaved"),
grouping = NULL,
folds = NULL,
fit = c("reweighted", "raw", "both"),
seed = NULL,
...
)
Arguments
object |
an object returned from a model fitting function. Methods
are implemented for objects of class |
... |
additional arguments to be passed to the prediction loss
function |
cost |
a cost function measuring prediction loss. It should expect
the observed values of the response to be passed as the first argument and
the predicted values as the second argument, and must return either a
non-negative scalar value, or a list with the first component containing
the prediction error and the second component containing the standard
error. The default is to use the root mean squared prediction error
for the |
K |
an integer giving the number of folds into which the data should
be split (the default is five). Keep in mind that this should be chosen
such that all folds are of approximately equal size. Setting |
R |
an integer giving the number of replications for repeated
|
foldType |
a character string specifying the type of folds to be
generated. Possible values are |
grouping |
a factor specifying groups of observations. If supplied, the data are split according to the groups rather than individual observations such that all observations within a group belong to the same fold. |
folds |
an object of class |
seed |
optional initial seed for the random number generator (see
|
fit |
a character string specifying for which fit to estimate the
prediction error. Possible values are |
Details
(Repeated) K
-fold cross-validation is performed in the following
way. The data are first split into K
previously obtained blocks of
approximately equal size. Each of the K
data blocks is left out once
to fit the model, and predictions are computed for the observations in the
left-out block with the predict
method of the fitted
model. Thus a prediction is obtained for each observation.
The response variable and the obtained predictions for all observations are
then passed to the prediction loss function cost
to estimate the
prediction error. For repeated cross-validation, this process is replicated
and the estimated prediction errors from all replications as well as their
average are included in the returned object.
Value
An object of class "cv"
with the following components:
n |
an integer giving the number of observations or groups. |
K |
an integer giving the number of folds. |
R |
an integer giving the number of replications. |
cv |
a numeric vector containing the estimated prediction
errors. For the |
se |
a numeric vector containing the estimated standard
errors of the prediction loss. For the |
reps |
a numeric matrix containing the estimated prediction
errors from all replications. For the |
seed |
the seed of the random number generator before cross-validation was performed. |
call |
the matched function call. |
Note
The repCV
methods are simple wrapper functions that extract the
data from the fitted model and call cvFit
to perform
cross-validation. In addition, cvLm
, cvLmrob
and cvLts
are aliases for the respective methods.
Author(s)
Andreas Alfons
See Also
cvFit
, cvFolds
, cost
,
lm
, lmrob
,
ltsReg
Examples
library("robustbase")
data("coleman")
set.seed(1234) # set seed for reproducibility
# set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
# perform cross-validation for an LS regression model
fitLm <- lm(Y ~ ., data = coleman)
repCV(fitLm, cost = rtmspe, folds = folds, trim = 0.1)
# perform cross-validation for an MM regression model
fitLmrob <- lmrob(Y ~ ., data = coleman)
repCV(fitLmrob, cost = rtmspe, folds = folds, trim = 0.1)
# perform cross-validation for an LTS regression model
fitLts <- ltsReg(Y ~ ., data = coleman)
repCV(fitLts, cost = rtmspe, folds = folds, trim = 0.1)
repCV(fitLts, cost = rtmspe, folds = folds,
fit = "both", trim = 0.1)