binda.ranking {binda} | R Documentation |
Binary Discriminant Analysis: Variable Ranking
Description
binda.ranking
determines a ranking of predictors by computing corresponding t-scores
between the group means and the pooled mean.
plot.binda.ranking
provides a graphical visualization of the top ranking variables
Usage
binda.ranking(Xtrain, L, lambda.freqs, verbose=TRUE)
## S3 method for class 'binda.ranking'
plot(x, top=40, arrow.col="blue", zeroaxis.col="red", ylab="Variables", main, ...)
Arguments
Xtrain |
A matrix containing the training data set. Note that the rows correspond to observations and the columns to variables. |
L |
A factor with the class labels of the training samples. |
lambda.freqs |
Shrinkage intensity for the class frequencies. If not specified it is
estimated from the data. |
verbose |
Print out some info while computing. |
x |
A "binda.ranking" object – this is produced by the binda.ranking() function. |
top |
The number of top-ranking variables shown in the plot (default: 40). |
arrow.col |
Color of the arrows in the plot (default is |
zeroaxis.col |
Color for the center zero axis (default is |
ylab |
Label written next to feature list (default is |
main |
Main title (if missing, |
... |
Other options passed on to generic plot(). |
Details
The overall ranking of a feature is determined by computing a weighted sum of
the squared t-scores. This is approximately equivalent to the mutual information between the response and each variable. The same criterion is used in dichotomize
. For precise details see Gibb and Strimmer (2015).
Value
binda.ranking
returns a matrix with the following columns:
idx |
original feature number |
score |
the score determining the overall ranking of a variable |
t |
for each group and feature the t-score of the class mean versus the pooled mean |
Author(s)
Sebastian Gibb and Korbinian Strimmer (https://strimmerlab.github.io).
References
Gibb, S., and K. Strimmer. 2015. Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis. Bioinformatics 31:3156-3162. <DOI:10.1093/bioinformatics/btv334>
See Also
binda
, predict.binda
, dichotomize
.
Examples
# load "binda" library
library("binda")
# training data set with labels
Xtrain = matrix(c(1, 1, 0, 1, 0, 0,
1, 1, 1, 1, 0, 0,
1, 0, 0, 0, 1, 1,
1, 0, 0, 0, 1, 1), nrow=4, byrow=TRUE)
colnames(Xtrain) = paste0("V", 1:ncol(Xtrain))
is.binaryMatrix(Xtrain) # TRUE
L = factor(c("Treatment", "Treatment", "Control", "Control") )
# ranking variables
br = binda.ranking(Xtrain, L)
br
# idx score t.Control t.Treatment
#V2 2 4.000000 -2.000000 2.000000
#V4 4 4.000000 -2.000000 2.000000
#V5 5 4.000000 2.000000 -2.000000
#V6 6 4.000000 2.000000 -2.000000
#V3 3 1.333333 -1.154701 1.154701
#V1 1 0.000000 0.000000 0.000000
#attr(,"class")
#[1] "binda.ranking"
#attr(,"cl.count")
#[1] 2
# show plot
plot(br)
# result: variable V1 is irrelevant for distinguishing the two groups