R: K-fold Cross Validation

crossValidation {DiceEval}

R Documentation

K-fold Cross Validation

Description

This function calculates the predicted values at each point of the design and gives an estimation of criterion using K-fold cross-validation.

Usage

crossValidation(model, K)

Arguments

`model`	an output of the `modelFit` function. This argument is the initial model fitted with all the data.
`K`	the number of groups into which the data should be split to apply cross-validation

Value

A list with the following components:

`Ypred`	a vector of predicted values obtained using K-fold cross-validation at the points of the design
`Q2`	a real which is the estimation of the criterion `R2` obtained by cross-validation
`folds`	a list which indicates the partitioning of the data into the folds
`RMSE_CV`	`RMSE` by K-fold cross-validation (see more details below)
`MAE_CV`	`MAE` by K-fold cross-validation (see more details below)

In the case of a Kriging model, other components to test the robustess of the procedure are proposed:

`theta`	the range parameter theta estimated for each fold,
`trend`	the trend parameter estimated for each fold,
`shape`	the estimated shape parameter if the covariance structure is of type `powerexp`.

The principle of cross-validation is to split the data into K folds of approximately equal size A_{1}{A1}, ..., A_{K}{AK}. For k=1 to K, a model \hat{Y}^{(-k)} is fitted from the data \cup_{j \neq k} A_{k} and this model is validated on the fold A_{k}. Given a criterion of quality L (here, L could be the RMSE or the MAE criterion), the "evaluation" of the model consists in computing :

L_{k} = \frac{1}{n/K} \sum_{i \in A_{k}} L \left( y_{i}, Y^{(-k)} (x_{i} )\right).

The cross-validation criterion is the mean of the K criterion: L_CV=\frac{1}{K} \sum_{k=1}^{K} L_{k}.

The Q2 criterion is defined as: Q2=\code{R2}(\code{Y},\code{Ypred}) with Y the response value and Ypred the value fit by cross-validation.

Note

When K is equal to the number of observations, leave-one-out cross-validation is performed.

Author(s)

D. Dupuy

Examples

## Not run: 
rm(list=ls())
# A 2D example
Branin <- function(x1,x2) {
  x1 <- x1*15-5   
  x2 <- x2*15
  (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10
}

# Linear model on 50 points
n <- 50
X <- matrix(runif(n*2),ncol=2,nrow=n)
Y <- Branin(X[,1],X[,2])
modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2))
R2(Y,modLm$model$fitted.values)
crossValidation(modLm,K=10)$Q2


# kriging model : gaussian covariance structure, no trend, no nugget effect
# on 16 points 
n <- 16
X <- data.frame(x1=runif(n),x2=runif(n))
Y <- Branin(X[,1],X[,2])
mKm <- modelFit(X,Y,type="Kriging",formula=~1, covtype="powexp")
K <- 10
out   <- crossValidation(mKm, K)
par(mfrow=c(2,2))
plot(c(0,1:K),c(mKm$model@covariance@range.val[1],out$theta[,1]),
 	xlab='',ylab='Theta1')
 plot(c(0,1:K),c(mKm$model@covariance@range.val[2],out$theta[,2]),
 	xlab='',ylab='Theta2')
 plot(c(0,1:K),c(mKm$model@covariance@shape.val[1],out$shape[,1]),
 	xlab='',ylab='p1',ylim=c(0,2))
 plot(c(0,1:K),c(mKm$model@covariance@shape.val[2],out$shape[,2]),
 	xlab='',ylab='p2',ylim=c(0,2))
par(mfrow=c(1,1))

## End(Not run)

[Package DiceEval version 1.6.1 Index]