crossValidation {DiceEval} R Documentation

## K-fold Cross Validation

### Description

This function calculates the predicted values at each point of the design and gives an estimation of criterion using K-fold cross-validation.

### Usage

crossValidation(model, K)

### Arguments

 model an output of the modelFit function. This argument is the initial model fitted with all the data. K the number of groups into which the data should be split to apply cross-validation

### Value

A list with the following components:

 Ypred a vector of predicted values obtained using K-fold cross-validation at the points of the design Q2 a real which is the estimation of the criterion R2 obtained by cross-validation folds a list which indicates the partitioning of the data into the folds RMSE_CV RMSE by K-fold cross-validation (see more details below) MAE_CV MAE by K-fold cross-validation (see more details below)

In the case of a Kriging model, other components to test the robustess of the procedure are proposed:

 theta the range parameter theta estimated for each fold, trend the trend parameter estimated for each fold, shape the estimated shape parameter if the covariance structure is of type powerexp.

The principle of cross-validation is to split the data into K folds of approximately equal size A_{1}{A1}, ..., A_{K}{AK}. For k=1 to K, a model \hat{Y}^{(-k)} is fitted from the data \cup_{j \neq k} A_{k} and this model is validated on the fold A_{k}. Given a criterion of quality L (here, L could be the RMSE or the MAE criterion), the "evaluation" of the model consists in computing :

L_{k} = \frac{1}{n/K} \sum_{i \in A_{k}} L \left( y_{i}, Y^{(-k)} (x_{i} )\right).

The cross-validation criterion is the mean of the K criterion: L_CV=\frac{1}{K} \sum_{k=1}^{K} L_{k}.

The Q2 criterion is defined as: Q2=\code{R2}(\code{Y},\code{Ypred}) with Y the response value and Ypred the value fit by cross-validation.

### Note

When K is equal to the number of observations, leave-one-out cross-validation is performed.

D. Dupuy

### Examples

## Not run:
rm(list=ls())
# A 2D example
Branin <- function(x1,x2) {
x1 <- x1*15-5
x2 <- x2*15
(x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10
}

# Linear model on 50 points
n <- 50
X <- matrix(runif(n*2),ncol=2,nrow=n)
Y <- Branin(X[,1],X[,2])
modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2))
R2(Y,modLm$model$fitted.values)
crossValidation(modLm,K=10)$Q2 # kriging model : gaussian covariance structure, no trend, no nugget effect # on 16 points n <- 16 X <- data.frame(x1=runif(n),x2=runif(n)) Y <- Branin(X[,1],X[,2]) mKm <- modelFit(X,Y,type="Kriging",formula=~1, covtype="powexp") K <- 10 out <- crossValidation(mKm, K) par(mfrow=c(2,2)) plot(c(0,1:K),c(mKm$model@covariance@range.val[1],out$theta[,1]), xlab='',ylab='Theta1') plot(c(0,1:K),c(mKm$model@covariance@range.val[2],out$theta[,2]), xlab='',ylab='Theta2') plot(c(0,1:K),c(mKm$model@covariance@shape.val[1],out$shape[,1]), xlab='',ylab='p1',ylim=c(0,2)) plot(c(0,1:K),c(mKm$model@covariance@shape.val[2],out\$shape[,2]),
xlab='',ylab='p2',ylim=c(0,2))
par(mfrow=c(1,1))

## End(Not run)

[Package DiceEval version 1.5 Index]