crossval {bgmm}R Documentation

k-fold cross-validation for the specified model

Description

The function crossval() performes k-fold cross-validation.

Usage

crossval(model = NULL, X = NULL, knowns = NULL, class = NULL, 
    k = length(unique(class)), B = NULL, P = NULL, model.structure = getModelStructure(), 
    ..., folds = 2, fun = belief) 

Arguments

model

an object of the class mModel.

X

a data.frame with unknown realizations. If not supplied X is extracted from the model argument.

knowns

a data.frame with labeled realizations. If not supplied knowns is extracted from the model argument.

class, B, P

a vector of classes, beliefs and plausibilities. If not supplied they will be extracted from the model argument.

fun

function that will be used for modeling, one of supervised, unsupervised, belief, soft, semisupervised.

model.structure, k, ...

arguments that will be passed to fun function,

folds

number of folds in k-fold cross validation. Cannot be grated that number of labeled samples.

Details

The function crossval() divides the dataset into k equal subsets, the number of labeled cases versus number of unlabeled cases is keep as close to constant as possible (the subset are generated with stratification). Then each subset is used as test set against a train set build from all remaining sets. In total k new models are estimated thus this procedure is time consuming.

For each model the error is calculated as average absolute differences between the distribution of estimated posteriors and distribution of beliefs/plausibilities for labeled cases.

Value

The list with three vectors: errors calculated as mean absolute differences between estimated posteriors and initial beliefs for known cases, indexes of folds for both labeled and unlabeled cases.

Author(s)

Przemyslaw Biecek

References

Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software.

Examples

                
 set.seed(1313)
 simulated = simulateData(d=2, k=3, n=300, m=60, cov="0", within="E", n.labels=2)
 amodel = belief(X=simulated$X, knowns=simulated$knowns, B=simulated$B, k=4)
 str(crossval(model=amodel, folds=6))

 amodel = supervised(knowns=rbind(simulated$X, simulated$knowns), class=simulated$Ytrue)
 str(crossval(model=amodel, folds=6, fun=supervised))

[Package bgmm version 1.8.5 Index]