crossval {bgmm}R Documentation

k-fold cross-validation for the specified model


The function crossval() performes k-fold cross-validation.


crossval(model = NULL, X = NULL, knowns = NULL, class = NULL, 
    k = length(unique(class)), B = NULL, P = NULL, model.structure = getModelStructure(), 
    ..., folds = 2, fun = belief) 



an object of the class mModel.


a data.frame with unknown realizations. If not supplied X is extracted from the model argument.


a data.frame with labeled realizations. If not supplied knowns is extracted from the model argument.

class, B, P

a vector of classes, beliefs and plausibilities. If not supplied they will be extracted from the model argument.


function that will be used for modeling, one of supervised, unsupervised, belief, soft, semisupervised.

model.structure, k, ...

arguments that will be passed to fun function,


number of folds in k-fold cross validation. Cannot be grated that number of labeled samples.


The function crossval() divides the dataset into k equal subsets, the number of labeled cases versus number of unlabeled cases is keep as close to constant as possible (the subset are generated with stratification). Then each subset is used as test set against a train set build from all remaining sets. In total k new models are estimated thus this procedure is time consuming.

For each model the error is calculated as average absolute differences between the distribution of estimated posteriors and distribution of beliefs/plausibilities for labeled cases.


The list with three vectors: errors calculated as mean absolute differences between estimated posteriors and initial beliefs for known cases, indexes of folds for both labeled and unlabeled cases.


Przemyslaw Biecek


Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software.


 simulated = simulateData(d=2, k=3, n=300, m=60, cov="0", within="E", n.labels=2)
 amodel = belief(X=simulated$X, knowns=simulated$knowns, B=simulated$B, k=4)
 str(crossval(model=amodel, folds=6))

 amodel = supervised(knowns=rbind(simulated$X, simulated$knowns), class=simulated$Ytrue)
 str(crossval(model=amodel, folds=6, fun=supervised))

[Package bgmm version 1.8.5 Index]