nlcv {nlcv} | R Documentation |
Nested Loop Cross-Validation
Description
This function first proceeds to a feature selection and then applies five different classification algorithms.
Usage
nlcv(eset, classVar = "type", nRuns = 2, propTraining = 2/3,
classdist = c("balanced", "unbalanced"), nFeatures = c(2, 3, 5, 7, 10, 15,
20, 25, 30, 35), fsMethod = c("randomForest", "t.test", "limma", "none"),
classifMethods = c("dlda", "randomForest", "bagg", "pam", "svm"),
fsPar = NULL, initialGenes = seq(length.out = nrow(eset)),
geneID = "ID", storeTestScores = FALSE, verbose = FALSE, seed = 123)
Arguments
eset |
ExpressionSet object containing the genes to classify |
classVar |
String giving the name of the variable containing the
observed class labels, should be contained in the phenoData of |
nRuns |
Number of runs for the outer loop of the cross-validation |
propTraining |
Proportion of the observations to be assigned to the
training set. By default |
classdist |
distribution of classes; allows to indicate whether your distribution is 'balanced' or 'unbalanced'. The sampling strategy for each run is adapted accordingly. |
nFeatures |
Numeric vector with the number of features to be selected from the features kept by the feature selection method. For each number n specified in this vector the classification algorithms will be run using only the top n features. |
fsMethod |
Feature selection method; one of |
classifMethods |
character vector with the classification methods to be
used in the analysis; elements can be chosen among
|
fsPar |
List of further parameters to pass to the feature selection
method; currently the default for |
initialGenes |
Initial subset of genes in the ExpressionSet on which to apply the nested loop cross validation procedure. By default all genes are selected. |
geneID |
string representing the name of the gene ID variable in the fData of the expression set to use; this argument was added for people who use e.g. both Entrez IDs and Ensemble gene IDs |
storeTestScores |
should the test scores be stored in the |
verbose |
Should the output be verbose ( |
seed |
integer with seed, set at the start of the cross-validation. |
Value
The result is an object of class 'nlcv'. It is a list with two
components, output
and features
.
De output
component is a list of five components, one for each
classification algorithm used. Each of these components has as many
components as there are elements in the nFeatures
vector. These
components contain both the error rates for each run (component
errorRate
) and the predicted labels for each run (character matrix
labelsMat
).
The features
list is a list with as many components as there are
runs. For each run, a named vector is given with the variable importance
measure for each gene. For t test based feature selection, P-values are
used; for random forest based feature selection the variable importance
measure is given.
Note
The variable importance measure used is the third column of the output
returned by the randomForest
function.
Author(s)
Willem Talloen and Tobias Verbeke