cvpre {pre} | R Documentation |
Full k-fold cross validation of a prediction rule ensemble (pre)
Description
cvpre
performs k-fold cross validation on the dataset used to create
the specified prediction rule ensemble, providing an estimate of predictive
accuracy on future observations.
Usage
cvpre(
object,
k = 10,
penalty.par.val = "lambda.1se",
pclass = 0.5,
foldids = NULL,
verbose = FALSE,
parallel = FALSE,
print = TRUE,
...
)
Arguments
object |
An object of class |
k |
integer. The number of cross validation folds to be used. |
penalty.par.val |
character or numeric. Value of the penalty parameter
|
pclass |
numeric. Only used for binary classification. Cut-off value for the predicted probabilities that should be used to classify observations to the second class. |
foldids |
numeric vector of |
verbose |
logical. Should progress of the cross validation be printed to the command line? |
parallel |
logical. Should parallel foreach be used? Must register parallel beforehand, such as doMC or others. |
print |
logical. Should accuracy estimates be printed to the command line? |
... |
Further arguments to be passed to |
Details
The random sampling employed by default may yield folds including all
observations with a given level of a given factor. This results in an error,
as it requires predictions for factor levels to be computed that were not
observed in the training data, which is impossible. By manually specifying the
foldids
argument, users can make sure all class levels are represented in
each of the k
training partitions.
Value
Calculates cross-validated estimates of predictive accuracy and prints
these to the command line. For survival regression, accuracy is not calculated,
as there is currently no agreed-upon way to best quantify accuracy in survival
regression models. Users can compute their own accuracy estimates using the
(invisibly returned) cross-validated predictions ($cvpreds
).
Invisibly, a list of three objects is returned:
accuracy
(containing accuracy estimates), cvpreds
(containing cross-validated predictions) and fold_indicators
(a vector indicating
the cross validation fold each observation was part of). For (multivariate) continuous
outcomes, accuracy is a list with elements $MSE
(mean squared error on test
observations) and $MAE
(mean absolute error on test observations). For
(binary and multiclass) classification, accuracy is a list with elements
$SEL
(mean squared error on predicted probabilities), $AEL
(mean absolute
error on predicted probabilities), $MCR
(average misclassification error rate)
and $table
(proportion table with (mis)classification rates).
See Also
pre
, plot.pre
,
coef.pre
, importance.pre
, predict.pre
,
interact
, print.pre
Examples
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airquality[complete.cases(airquality),])
airq.cv <- cvpre(airq.ens)