R: Stratified cross validation

stratified.cross.validation {HEMDAG}

R Documentation

Stratified cross validation

Description

Generate data for the stratified cross-validation.

Usage

stratified.cv.data.single.class(examples, positives, kk = 5, seed = NULL)

stratified.cv.data.over.classes(labels, examples, kk = 5, seed = NULL)

Arguments

`examples`	indices or names of the examples. Can be either a vector of integers or a vector of names.
`positives`	vector of integers or vector of names. The indices (or names) refer to the indices (or names) of 'positive' examples.
`kk`	number of folds (`def. kk=5`).
`seed`	seed of the random generator (`def. seed=NULL`). If is set to `NULL` no initialization is performed.
`labels`	labels matrix. Rows are genes and columns are classes. Let's denote `M` the labels matrix. If `M[i,j]=1`, means that the gene `i` is annotated with the class `j`, otherwise `M[i,j]=0`.

Details

Folds are stratified, i.e. contain the same amount of positive and negative examples.

Value

stratified.cv.data.single.class returns a list with 2 two component:

fold.non.positives: a list with k components. Each component is a vector with the indices (or names) of the non-positive elements. Indexes (or names) refer to row numbers (or names) of a data matrix;
fold.positives: a list with k components. Each component is a vector with the indices (or names) of the positive elements. Indexes (or names) refer to row numbers (or names) of a data matrix;

stratified.cv.data.over.classes returns a list with n components, where n is the number of classes of the labels matrix. Each component n is in turn a list with k elements, where k is the number of folds. Each fold contains an equal amount of positives and negatives examples.

Examples

data(labels);
examples.index <- 1:nrow(L);
examples.name <- rownames(L);
positives <- which(L[,3]==1);
x <- stratified.cv.data.single.class(examples.index, positives, kk=5, seed=23);
y <- stratified.cv.data.single.class(examples.name, positives, kk=5, seed=23);
z <- stratified.cv.data.over.classes(L, examples.index, kk=5, seed=23);
k <- stratified.cv.data.over.classes(L, examples.name, kk=5, seed=23);

[Package HEMDAG version 2.7.4 Index]