R: k-fold cross-validation for the specified model

crossval {bgmm}

R Documentation

k-fold cross-validation for the specified model

Description

The function crossval() performes k-fold cross-validation.

Usage

crossval(model = NULL, X = NULL, knowns = NULL, class = NULL, 
    k = length(unique(class)), B = NULL, P = NULL, model.structure = getModelStructure(), 
    ..., folds = 2, fun = belief)

Arguments

`model`	an object of the class `mModel`.
`X`	a data.frame with unknown realizations. If not supplied `X` is extracted from the `model` argument.
`knowns`	a data.frame with labeled realizations. If not supplied `knowns` is extracted from the `model` argument.
`class`, `B`, `P`	a vector of classes, beliefs and plausibilities. If not supplied they will be extracted from the `model` argument.
`fun`	function that will be used for modeling, one of `supervised`, `unsupervised`, `belief`, `soft`, `semisupervised`.
`model.structure`, `k`, `...`	arguments that will be passed to `fun` function,
`folds`	number of folds in k-fold cross validation. Cannot be grated that number of labeled samples.

Details

The function crossval() divides the dataset into k equal subsets, the number of labeled cases versus number of unlabeled cases is keep as close to constant as possible (the subset are generated with stratification). Then each subset is used as test set against a train set build from all remaining sets. In total k new models are estimated thus this procedure is time consuming.

For each model the error is calculated as average absolute differences between the distribution of estimated posteriors and distribution of beliefs/plausibilities for labeled cases.

Value

The list with three vectors: errors calculated as mean absolute differences between estimated posteriors and initial beliefs for known cases, indexes of folds for both labeled and unlabeled cases.

Author(s)

Przemyslaw Biecek

References

Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software.

Examples

                
 set.seed(1313)
 simulated = simulateData(d=2, k=3, n=300, m=60, cov="0", within="E", n.labels=2)
 amodel = belief(X=simulated$X, knowns=simulated$knowns, B=simulated$B, k=4)
 str(crossval(model=amodel, folds=6))

 amodel = supervised(knowns=rbind(simulated$X, simulated$knowns), class=simulated$Ytrue)
 str(crossval(model=amodel, folds=6, fun=supervised))

[Package bgmm version 1.8.5 Index]