crossValidation {DiceEval}R Documentation

K-fold Cross Validation

Description

This function calculates the predicted values at each point of the design and gives an estimation of criterion using K-fold cross-validation.

Usage

crossValidation(model, K)

Arguments

model

an output of the modelFit function. This argument is the initial model fitted with all the data.

K

the number of groups into which the data should be split to apply cross-validation

Value

A list with the following components:

Ypred

a vector of predicted values obtained using K-fold cross-validation at the points of the design

Q2

a real which is the estimation of the criterion R2 obtained by cross-validation

folds

a list which indicates the partitioning of the data into the folds

RMSE_CV

RMSE by K-fold cross-validation (see more details below)

MAE_CV

MAE by K-fold cross-validation (see more details below)

In the case of a Kriging model, other components to test the robustess of the procedure are proposed:

theta

the range parameter theta estimated for each fold,

trend

the trend parameter estimated for each fold,

shape

the estimated shape parameter if the covariance structure is of type powerexp.

The principle of cross-validation is to split the data into K folds of approximately equal size A_{1}{A1}, ..., A_{K}{AK}. For k=1 to K, a model \hat{Y}^{(-k)} is fitted from the data \cup_{j \neq k} A_{k} and this model is validated on the fold A_{k}. Given a criterion of quality L (here, L could be the RMSE or the MAE criterion), the "evaluation" of the model consists in computing :

L_{k} = \frac{1}{n/K} \sum_{i \in A_{k}} L \left( y_{i}, Y^{(-k)} (x_{i} )\right).

The cross-validation criterion is the mean of the K criterion: L_CV=\frac{1}{K} \sum_{k=1}^{K} L_{k}.

The Q2 criterion is defined as: Q2=\code{R2}(\code{Y},\code{Ypred}) with Y the response value and Ypred the value fit by cross-validation.

Note

When K is equal to the number of observations, leave-one-out cross-validation is performed.

Author(s)

D. Dupuy

See Also

R2, modelFit, MAE, RMSE, foldsComposition, testCrossValidation

Examples

## Not run: 
rm(list=ls())
# A 2D example
Branin <- function(x1,x2) {
  x1 <- x1*15-5   
  x2 <- x2*15
  (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10
}

# Linear model on 50 points
n <- 50
X <- matrix(runif(n*2),ncol=2,nrow=n)
Y <- Branin(X[,1],X[,2])
modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2))
R2(Y,modLm$model$fitted.values)
crossValidation(modLm,K=10)$Q2


# kriging model : gaussian covariance structure, no trend, no nugget effect
# on 16 points 
n <- 16
X <- data.frame(x1=runif(n),x2=runif(n))
Y <- Branin(X[,1],X[,2])
mKm <- modelFit(X,Y,type="Kriging",formula=~1, covtype="powexp")
K <- 10
out   <- crossValidation(mKm, K)
par(mfrow=c(2,2))
plot(c(0,1:K),c(mKm$model@covariance@range.val[1],out$theta[,1]),
 	xlab='',ylab='Theta1')
 plot(c(0,1:K),c(mKm$model@covariance@range.val[2],out$theta[,2]),
 	xlab='',ylab='Theta2')
 plot(c(0,1:K),c(mKm$model@covariance@shape.val[1],out$shape[,1]),
 	xlab='',ylab='p1',ylim=c(0,2))
 plot(c(0,1:K),c(mKm$model@covariance@shape.val[2],out$shape[,2]),
 	xlab='',ylab='p2',ylim=c(0,2))
par(mfrow=c(1,1))

## End(Not run)

[Package DiceEval version 1.6.1 Index]