get.biom {BioMark}R Documentation

Get biomarkers discriminating between two classes


Biomarkers can be identified in several ways: the classical way is to look at those variables with large model coefficients or large t statistics. One other is based on the higher criticism approach (HC), and the third possibility assesses the stability of these coefficients under subsampling of the data set.


get.biom(X, Y, fmethod = "all", type = c("stab", "HC", "coef"),
         ncomp = 2, biom.opt = biom.options(), scale.p = "auto",
## S3 method for class 'BMark'
coef(object, ...)
## S3 method for class 'BMark'
print(x, ...)
## S3 method for class 'BMark'
summary(object, ...)



Data matrix. Usually the number of columns (variables) is (much) larger than the number of rows (samples).


Class indication. For classification with two or more factors a factor; a numeric vector will be interpreted as a regression situation, which can only be tackled by fmethod = "lasso".


Modelling method(s) employed. The default is to use "all", which will test all methods in the current biom.options$fmethods list. Note that from version 0.4.0, "plsda" and "pclda" are no longer in the list of methods - they have been replaced by "pls" and "pcr", respectively. For compatibility reasons, using the old terms will not lead to an error but only a warning.


Whether to use coefficient size as a criterion ("coef"), "stab" or "HC".


Number of latent variables to use in PCR and PLS (VIP) modelling. In function get.biom this may be a vector; in all other functions it should be one number. Default: 2.


Options for the biomarker selection - a list with several named elements. See biom.options.


Scaling. This is performed individually in every crossvalidation iteration, and can have a profound effect on the results. Default: "auto" (autoscaling). Other possible choices: "none" for no scaling, "pareto" for pareto scaling, "log" and "sqrt" for log and square root scaling, respectively.

object, x

A BMark object.


Further arguments for modelling functions. Often used to catch unused arguments.


Function get.biom returns an object of class "BMark", a list containing an element for every fmethod that is selected, as well as an element info. The individual elements contain information depending on the type chosen: for type == "coef", the only element returned is a matrix containing coefficient sizes. For type == "HC" and type == "stab", a list is returned containing elements biom.indices, and either pvals (for type == "HC") or fraction.selected (for type == "stab"). Element biom.indices contains the indices of the selected variables, and can be extracted using function selection. Element pvals contains the p values used to perform HC thresholding; these are presented in the original order of the variables, and can be obtained directly from e.g. t statistics, or from permutation sampling. Element fraction.selected indicates in what fraction of the stability selection iterations a particular variable has been selected. The more often it has been selected, the more stable it is as a biomarker. Generic function coef.biom extracts model coefficients, p values or stability fractions for types "coef", "HC" and "stab", respectively.


Ron Wehrens

See Also

biom.options, get.segments, selection, scalefun


## Real apple data (small set)
apple.coef <- get.biom(X = spikedApples$dataMatrix,
                       Y = factor(rep(1:2, each = 10)),
                       ncomp = 2:3, type = "coef")
coef.sizes <- coef(apple.coef) 
sapply(coef.sizes, range)

## stability-based selection
apple.stab <- get.biom(X = spikedApples$dataMatrix,
                       Y = factor(rep(1:2, each = 10)),
                       ncomp = 2:3, type = "stab")
selected.variables <- selection(apple.stab)
unlist(sapply(selected.variables, function(x) sapply(x, length)))
## Ranging from more than 70 for pcr, approx 40 for pls and student t,
## to 0-29 for the lasso
              function(x) lapply(x, function(xx, y) sum(xx %in% y),
## TPs (stab): all find 5/5, except pcr.2 and the lasso with values for lambda
## larger than 0.0484

              function(x) lapply(x, function(xx, y) sum(!(xx %in% y)),
## FPs (stab): PCR finds most FPs (approx. 60), other latent-variable
## methods approx 40, lasso allows for the optimal selection around 
## lambda = 0.0702

## regression example
data(gasoline) ## from the pls package
gasoline.stab <- get.biom(gasoline$NIR, gasoline$octane,
                          fmethod = c("pcr", "pls", "lasso"), type = "stab")

## Not run: 
## Same for HC-based selection
## Warning: takes a long time!
apple.HC <- get.biom(X = spikedApples$dataMatrix,
                     Y = factor(rep(1:2, each = 10)),
                     ncomp = 2:3, type = "HC")
sapply(apple.HC[names(apple.HC) != "info"],
       function(x, y) sum(x$biom.indices %in% y),
sapply(apple.HC[names(apple.HC) != "info"],
       function(x, y) sum(!(x$biom.indices %in% y)),

## End(Not run)

[Package BioMark version 0.4.5 Index]