Dichotomize Continuous Data Set With Labels


dichotomize converts a matrix containing continous measurements into a binary matrix.

optimizeThreshold determines optimal thresholds for dichotomization.


dichotomize(X, thresh)
optimizeThreshold(X, L, lambda.freqs, verbose=FALSE)



data matrix (columns correspond to variables, rows to samples).


vector of thresholds, one for each variable (column).


factor containing the class labels, one for each sample (row).


shrinkage parameter for class frequencies (if not specified it is estimated).


report shrinkage intensity and other information.


dichotomize assigns 0 if a matrix entry is lower than given column-specific threshold, otherwise it assigns 1.

optimizeThreshold uses (approximate) mutual information to determine the optimal thresholds. Specifically, the thresholds are chosen to maximize the mutual information between response and each variable. The same criterion is also used in binda.ranking. For detailed description of the dichotomization procedure see Gibb and Strimmer (2015).

Class frequencies are estimated using freqs.shrink.


dichotomize returns a binary matrix.

optimizeThreshold returns a vector containing the variable thresholds.


Sebastian Gibb and Korbinian Strimmer (https://strimmerlab.github.io).


Gibb, S., and K. Strimmer. 2015. Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis. Bioinformatics 31:3156-3162. <DOI:10.1093/bioinformatics/btv334>

# load binda library

# example data with 6 variables (in columns) and 4 samples (in rows)
X = matrix(c(1, 1, 1, 1.75, 0.4,    0,
             1, 1, 2,    2, 0.4, 0.09,
             1, 0, 1,    1, 0.5,  0.1,
             1, 0, 1,  0.5, 0.6,  0.1), nrow=4, byrow=TRUE)
colnames(X) = paste0("V", 1:ncol(X))

# class labels
L = factor(c("Treatment", "Treatment", "Control", "Control") )
rownames(X) = paste0(L, rep(1:2, times=2))

#          V1 V2 V3   V4  V5   V6
#Treatment1  1  1  1 1.75 0.4 0.00
#Treatment2  1  1  2 2.00 0.4 0.09
#Control1    1  0  1 1.00 0.5 0.10
#Control2    1  0  1 0.50 0.6 0.10

# find optimal thresholds (one for each variable)
thr = optimizeThreshold(X, L)
#  V1   V2   V3   V4   V5   V6 
#1.00 1.00 2.00 1.75 0.50 0.10

# convert into binary matrix
# if value is lower than threshold -> 0 otherwise -> 1
Xb = dichotomize(X, thr)
is.binaryMatrix(Xb) # TRUE
#          V1 V2 V3 V4 V5 V6
#Treatment1  1  1  0  1  0  0
#Treatment2  1  1  1  1  0  0
#Control1    1  0  0  0  1  1
#Control2    1  0  0  0  1  1
#  V1   V2   V3   V4   V5   V6 
#1.00 1.00 2.00 1.75 0.50 0.10

