lol.xval.optimal_dimselect {lolR} | R Documentation |
Optimal Cross-Validated Number of Embedding Dimensions
Description
A function for performing leave-one-out cross-validation for a given embedding model, that allows users to determine the optimal number of embedding dimensions for
their algorithm-of-choice. This function produces fold-wise cross-validated misclassification rates for standard embedding techniques across a specified selection of
embedding dimensions. Optimal embedding dimension is selected as the dimension with the lowest average misclassification rate across all folds.
Users can optionally specify custom embedding techniques with proper configuration of alg.*
parameters and hyperparameters.
Optional classifiers implementing the S3 predict
function can be used for classification, with hyperparameters to classifiers for
determining misclassification rate specified in classifier.*
.
Usage
lol.xval.optimal_dimselect(
X,
Y,
rs,
alg,
sets = NULL,
alg.dimname = "r",
alg.opts = list(),
alg.embedding = "A",
alg.structured = TRUE,
classifier = lda,
classifier.opts = list(),
classifier.return = "class",
k = "loo",
rank.low = FALSE,
...
)
Arguments
X |
|
Y |
|
rs |
|
alg |
the algorithm to use for embedding. Should be a function that accepts inputs |
sets |
a user-defined cross-validation set. Defaults to
|
alg.dimname |
the name of the parameter accepted by |
alg.opts |
the hyper-parameter options to pass to your algorithm as a keyworded list. Defaults to |
alg.embedding |
the attribute returned by
|
alg.structured |
a boolean to indicate whether the embedding matrix is structured. Provides performance increase by not having to compute the embedding matrix
|
classifier |
the classifier to use for assessing performance. The classifier should accept |
classifier.opts |
any extraneous options to be passed to the classifier function, as a list. Defaults to an empty list. |
classifier.return |
if the return type is a list,
|
k |
the cross-validated method to perform. Defaults to
|
rank.low |
whether to force the training set to low-rank. Defaults to
|
... |
trailing args. |
Value
Returns a list containing:
folds.data |
the results, as a data-frame, of the per-fold classification accuracy. |
foldmeans.data |
the results, as a data-frame, of the average classification accuracy for each |
optimal.lhat |
the classification error of the optimal |
.
optimal.r |
the optimal number of embedding dimensions from |
.
model |
the model trained on all of the data at the optimal number of embedding dimensions. |
classifier |
the classifier trained on all of the data at the optimal number of embedding dimensions. |
Details
For more details see the help vignette:
vignette("xval", package = "lolR")
For extending cross-validation techniques shown here to arbitrary embedding algorithms, see the vignette:
vignette("extend_embedding", package = "lolR")
For extending cross-validation techniques shown here to arbitrary classification algorithms, see the vignette:
vignette("extend_classification", package = "lolR")
Author(s)
Eric Bridgeford
Examples
# train model and analyze with loo validation using lda classifier
library(lolR)
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
# run cross-validation with the nearestCentroid method and
# leave-one-out cross-validation, which returns only
# prediction labels so we specify classifier.return as NaN
xval.fit <- lol.xval.optimal_dimselect(X, Y, rs=c(5, 10, 15), lol.project.lol,
classifier=lol.classify.nearestCentroid,
classifier.return=NaN, k='loo')
# train model and analyze with 5-fold validation using lda classifier
data <- lol.sims.rtrunk(n=200, d=30) # 200 examples of 30 dimensions
X <- data$X; Y <- data$Y
xval.fit <- lol.xval.optimal_dimselect(X, Y, rs=c(5, 10, 15), lol.project.lol, k=5)
# pass in existing cross-validation sets
sets <- lol.xval.split(X, Y, k=2)
xval.fit <- lol.xval.optimal_dimselect(X, Y, rs=c(5, 10, 15), lol.project.lol, sets=sets)