R: Binary Discriminant Analysis: Variable Ranking

binda.ranking {binda}

R Documentation

Binary Discriminant Analysis: Variable Ranking

Description

binda.ranking determines a ranking of predictors by computing corresponding t-scores between the group means and the pooled mean.

plot.binda.ranking provides a graphical visualization of the top ranking variables

Usage

binda.ranking(Xtrain, L, lambda.freqs, verbose=TRUE)
## S3 method for class 'binda.ranking'
plot(x, top=40, arrow.col="blue", zeroaxis.col="red", ylab="Variables", main, ...)

Arguments

`Xtrain`	A matrix containing the training data set. Note that the rows correspond to observations and the columns to variables.
`L`	A factor with the class labels of the training samples.
`lambda.freqs`	Shrinkage intensity for the class frequencies. If not specified it is estimated from the data. `lambda.freqs=0` implies no shrinkage (i.e. empirical frequencies) and `lambda.freqs=1` complete shrinkage (i.e. uniform frequencies).
`verbose`	Print out some info while computing.
`x`	A "binda.ranking" object – this is produced by the binda.ranking() function.
`top`	The number of top-ranking variables shown in the plot (default: 40).
`arrow.col`	Color of the arrows in the plot (default is `"blue"`).
`zeroaxis.col`	Color for the center zero axis (default is `"red"`).
`ylab`	Label written next to feature list (default is `"Variables"`).
`main`	Main title (if missing, `"The", top, "Top Ranking Variables"` is used).
`...`	Other options passed on to generic plot().

Details

The overall ranking of a feature is determined by computing a weighted sum of the squared t-scores. This is approximately equivalent to the mutual information between the response and each variable. The same criterion is used in dichotomize. For precise details see Gibb and Strimmer (2015).

Value

binda.ranking returns a matrix with the following columns:

`idx`	original feature number
`score`	the score determining the overall ranking of a variable
`t`	for each group and feature the t-score of the class mean versus the pooled mean

Author(s)

Sebastian Gibb and Korbinian Strimmer (https://strimmerlab.github.io).

References

Gibb, S., and K. Strimmer. 2015. Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis. Bioinformatics 31:3156-3162. <DOI:10.1093/bioinformatics/btv334>

Examples

# load "binda" library
library("binda")

# training data set with labels
Xtrain = matrix(c(1, 1, 0, 1, 0, 0,
             1, 1, 1, 1, 0, 0,
             1, 0, 0, 0, 1, 1,
             1, 0, 0, 0, 1, 1), nrow=4, byrow=TRUE)
colnames(Xtrain) = paste0("V", 1:ncol(Xtrain))
is.binaryMatrix(Xtrain) # TRUE
L = factor(c("Treatment", "Treatment", "Control", "Control") )

# ranking variables
br = binda.ranking(Xtrain, L)
br
#   idx    score t.Control t.Treatment
#V2   2 4.000000 -2.000000    2.000000
#V4   4 4.000000 -2.000000    2.000000
#V5   5 4.000000  2.000000   -2.000000
#V6   6 4.000000  2.000000   -2.000000
#V3   3 1.333333 -1.154701    1.154701
#V1   1 0.000000  0.000000    0.000000
#attr(,"class")
#[1] "binda.ranking"
#attr(,"cl.count")
#[1] 2

# show plot
plot(br)

# result: variable V1 is irrelevant for distinguishing the two groups

[Package binda version 1.0.4 Index]