generic.cv {randomUniformForest} | R Documentation |
Generic k-fold cross-validation
Description
Performs k-fold cross-validation 'n' times for any specified algorithm, using two of many metrics(test error, AUC, precision,...)
Usage
generic.cv(X, Y,
nTimes = 1,
k = 10,
seed = 2014,
regression = TRUE,
genericAlgo = NULL,
specificPredictFunction = NULL,
metrics = c("none", "AUC", "precision", "F-score", "L1", "geometric mean",
"geometric mean (precision)"))
Arguments
X |
a matrix or dataframe of observations |
Y |
a vector (a factor for classification) for the observed data. |
nTimes |
number of times that k-fold cross-validation need to be performed. |
k |
how many folds ? |
seed |
the seed for reproducibility. |
regression |
if TRUE, performs regression. |
genericAlgo |
wrapper function to embed the algorithm that one needs to assess. One can eventually add options. NULL is only for convenience. Wrapper function is needed to assess cross-validation. |
specificPredictFunction |
if the assessed model does not support the R generic method 'predict', one has to define here, with a function, how predictions have to be generated. |
metrics |
One of many other metrics one can call with the standard one, test error (or MSE for regression). |
Value
a list with the following components :
testError |
the values of test error. |
avgError |
mean of test error. |
stdDev |
standard deviation of test error. |
metric |
values of the other chosen metric. |
Author(s)
Saip Ciss saip.ciss@wanadoo.fr
Examples
## not run
# data(iris)
# Y <- iris$Species
# X <- iris[,-which(colnames(iris) == "Species")]
## 10-fold cross-validation for the randomUniformForest algorithm:
## create the wrapper function (setting 'threads = 1' since data are small)
# genericAlgo.ruf <- function(X, Y) randomUniformForest(X, Y,
# OOB = FALSE, importance = FALSE, threads = 1)
## run
# rUF.10cv.iris <- generic.cv(X, as.factor(Y),
# genericAlgo = genericAlgo.ruf, regression = FALSE)
## 10-fold cross-validation for the randomForest algorithm:
## create the wrapper function
# require(randomForest) || install.packages("randomForest")
# genericAlgo.rf <- function(X, Y) randomForest(X, Y)
## run
# RF.10cv.iris <- generic.cv(X, as.factor(Y),
# genericAlgo = genericAlgo.rf, regression = FALSE)
## 10-fold cross-validation for Gradient Boosting Machines algorithm (gbm package)
## create the wrapper function
# require(gbm) || install.packages("gbm")
# genericAlgo.gbm <- function(X, Y) gbm.fit(X, Y, distribution = "multinomial",
# n.trees = 500, shrinkage = 0.05, interaction.depth = 24, n.minobsinnode = 1)
## create a wrapper for the prediction function of gbm
# nClasses = length(unique(Y))
# specificPredictFunction.gbm <- function(model, newdata)
# {
# modelPrediction = predict(model, newdata, 500)
# predictions = matrix(modelPrediction, ncol = nClasses )
# colnames(predictions) = colnames(modelPrediction)
# return(as.factor(apply(predictions, 1, function(Z) names(which.max(Z)))))
# }
## run
# gbm.10cv.iris <- generic.cv(X, Y, genericAlgo = genericAlgo.gbm,
# specificPredictFunction = specificPredictFunction.gbm, regression = FALSE)
## 10-fold cross-validation for CART algorithm (rpart package):
# genericAlgo.CART <- function(X, Y)
#{
# ZZ = data.frame(Y, X)
# if (is.factor(Y)) { modelObject = rpart(Y ~., data = ZZ, method = "class", ...) }
# else { modelObject = rpart(Y ~., data = ZZ, ...) }
# return(modelObject)
#}
# specificPredictFunction.CART <- function(model, newdata)
# predict(model, data.frame(newdata), type= "vector")
# CART.10cv.iris <- generic.cv(X, as.factor(Y), genericAlgo = genericAlgo.CART,
# specificPredictFunction = specificPredictFunction.CART, regression = FALSE)