tr.rocc {rocc} | R Documentation |
Training of a ROC based classifier
Description
The function establishes the ROC based classifier, returning the classifier specifications.
Usage
tr.rocc(g, out, xgenes = 200)
Arguments
g |
the input data in form of a matrix with genes as rows and samples as columns. rownames(g) and colnames (g) must be specified. |
out |
describes the phenotype of the samples. a factor vector with levels 0 and 1 (in this order) with as many values as there are samples. |
xgenes |
numeric (vector of length 1), determines the number of features to be selected in feature selection. |
Details
For feature selection the function picks the given number of xgenes with highest AUC (AUC below 0.5 are mirrored). Features negatively associated (AUC below 0.5) are multiplied by -1. The selected features are merged by the mean values to form a metagene. Samples are ranked according to the metagene expression. The optimal split of positive (i.e., 1) and negative (i.e., 0) samples is determined as the split yielding the highest accuracy, i.e. correct class assignments in respect to the real class. The split yielding optimal accuracy in the ROC curve is determined using the package ROCR. The metagene threshold is computed as the mean metagene expression value of the two samples that build the boarder of the split. The final classifier specifications consist of a) the selected genes b) positive (AUC above 0.5) or negative (AUC below 0.5) association of these genes to the true class, and c) the metagene threshold. A new sample can be classified using the o.rocc() function.
Value
a list as a trocc object with components
AUCs |
a matrix containing the selected features with corresponding AUC (aucv), positiv or negativ association (posneg), and mirrored AUC (allpos). |
genes |
character vector containing the genes selected in the feature selection. |
positiv |
character vector containing all positively associated genes (AUC above 0.5) selected in the feature selection. |
negativ |
character vector containing all negatively associated genes (AUC below 0.5) selected in the feature selection. |
metagene.expression |
numeric vector containing the metagene values of the training samples. |
metagene.expression.ranked |
numeric vector containing the samples ranked by metagene expression values. |
cutoffvalue |
the metagene threshold obtained from the best split of training samples. |
method |
the classification method used: ROC.based.predictor. |
Note
depends on the package ROCR
Author(s)
Martin Lauss
References
Lauss M, Frigyesi A, Ryden T, Hoglund M. Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier. BMC Cancer 2010 (in print)
See Also
p.rocc,o.rocc
Examples
### Random Dataset and phenotype
set.seed(100)
## Dataset should be a matrix
g <- matrix(rnorm(1000*25),ncol=25)
rownames(g) <- paste("Gene",1:1000,sep="_")
colnames(g) <- paste("Sample",1:25,sep="_")
## Phenotype should be a factor with levels 0 and 1:
out <- as.factor(sample(c(0:1),size=25,replace=TRUE))
predictor <- tr.rocc (g,out,xgenes=50)
## find classifier specification:
predictor$positiv
predictor$negativ
predictor$cutoffvalue