lol.xval.eval {lolR}R Documentation

Embedding Cross Validation


A function for performing leave-one-out cross-validation for a given embedding model. This function produces fold-wise cross-validated misclassification rates for standard embedding techniques. Users can optionally specify custom embedding techniques with proper configuration of alg.* parameters and hyperparameters. Optional classifiers implementing the S3 predict function can be used for classification, with hyperparameters to classifiers for determining misclassification rate specified in classifier.* parameters and hyperparameters.


  sets = NULL,
  alg.dimname = "r",
  alg.opts = list(),
  alg.embedding = "A",
  classifier = lda,
  classifier.opts = list(),
  classifier.return = "class",
  k = "loo",
  rank.low = FALSE,



[n, d] the data with n samples in d dimensions.


[n] the labels of the samples with K unique labels.


the number of embedding dimensions desired, where r <= d.


the algorithm to use for embedding. Should be a function that accepts inputs X, Y, and has a parameter for alg.dimname if alg is supervised, or just X and alg.dimname if alg is unsupervised.This algorithm should return a list containing a matrix that embeds from d to r <= d dimensions.


a user-defined cross-validation set. Defaults to NULL.

  • is.null(sets) randomly partition the inputs X and Y into training and testing sets.

  • !is.null(sets) use a user-defined partitioning of the inputs X and Y into training and testing sets. Should be in the format of the outputs from lol.xval.split. That is, a list with each element containing X.train, an [n-k][d] subset of data to test on, Y.train, an [n-k] subset of class labels for X.train; X.test, an [n-k][d] subset of data to test the model on, Y.train, an [k] subset of class labels for X.test.


the name of the parameter accepted by alg for indicating the embedding dimensionality desired. Defaults to r.


the hyper-parameter options you want to pass into your algorithm, as a keyworded list. Defaults to list(), or no hyper-parameters.


the attribute returned by alg containing the embedding matrix. Defaults to assuming that alg returns an embgedding matrix as "A".

  • !is.nan(alg.embedding) Assumes that alg will return a list containing an attribute, alg.embedding, a [d, r] matrix that embeds [n, d] data from [d] to [r < d] dimensions.

  • is.nan(alg.embedding) Assumes that alg returns a [d, r] matrix that embeds [n, d] data from [d] to [r < d] dimensions.


the classifier to use for assessing performance. The classifier should accept X, a [n, d] array as the first input, and Y, a [n] array of labels, as the first 2 arguments. The class should implement a predict function, predict.classifier, that is compatible with the stats::predict S3 method. Defaults to MASS::lda.


any extraneous options to be passed to the classifier function, as a list. Defaults to an empty list.


if the return type is a list, class encodes the attribute containing the prediction labels from stats::predict. Defaults to the return type of MASS::lda, class.

  • !is.nan(classifier.return) Assumes that predict.classifier will return a list containing an attribute, classifier.return, that encodes the predicted labels.

  • is.nan(classifier.return) Assumes that predict.classifer returns a [n] vector/array containing the prediction labels for [n, d] inputs.


the cross-validated method to perform. Defaults to 'loo'. If sets is provided, this option is ignored. See lol.xval.split for details.

  • 'loo' Leave-one-out cross validation

  • isinteger(k) perform k-fold cross-validation with k as the number of folds.


whether to force the training set to low-rank. Defaults to FALSE. If sets is provided, this option is ignored. See lol.xval.split for details.

  • if rank.low == FALSE, uses default cross-validation method with standard k-fold validation. Training sets are k-1 folds, and testing sets are 1 fold, where the fold held-out for testing is rotated to ensure no dependence of potential downstream inference in the cross-validated misclassification rates.

  • if ]coderank.low == TRUE, users cross-validation method with ntrain = min((k-1)/k*n, d) sample training sets, where d is the number of dimensions in X. This ensures that the training data is always low-rank, ntrain < d + 1. Note that the resulting training sets may have ntrain < (k-1)/k*n, but the resulting testing sets will always be properly rotated ntest = n/k to ensure no dependencies in fold-wise testing.


trailing args.


Returns a list containing:


the mean cross-validated error.


The model returned by alg computed on all of the data.


The classifier trained on all of the embedded data.


the cross-validated error for each of the k-folds.


For more details see the help vignette: vignette("xval", package = "lolR")

For extending cross-validation techniques shown here to arbitrary embedding algorithms, see the vignette: vignette("extend_embedding", package = "lolR")

For extending cross-validation techniques shown here to arbitrary classification algorithms, see the vignette: vignette("extend_classification", package = "lolR")


Eric Bridgeford


# train model and analyze with loo validation using lda classifier
data <- lol.sims.rtrunk(n=200, d=30)  # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
r=5  # embed into r=5 dimensions
# run cross-validation with the nearestCentroid method and
# leave-one-out cross-validation, which returns only
# prediction labels so we specify classifier.return as NaN <- lol.xval.eval(X, Y, r,,
                          classifier.return=NaN, k='loo')

# train model and analyze with 5-fold validation using lda classifier
data <- lol.sims.rtrunk(n=200, d=30)  # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y <- lol.xval.eval(X, Y, r,, k=5)

# pass in existing cross-validation sets
sets <- lol.xval.split(X, Y, k=2) <- lol.xval.eval(X, Y, r,, sets=sets)

[Package lolR version 2.1 Index]