get.biom {BioMark} | R Documentation |
Get biomarkers discriminating between two classes
Description
Biomarkers can be identified in several ways: the classical way is to look at those variables with large model coefficients or large t statistics. One other is based on the higher criticism approach (HC), and the third possibility assesses the stability of these coefficients under subsampling of the data set.
Usage
get.biom(X, Y, fmethod = "all", type = c("stab", "HC", "coef"),
ncomp = 2, biom.opt = biom.options(), scale.p = "auto",
...)
## S3 method for class 'BMark'
coef(object, ...)
## S3 method for class 'BMark'
print(x, ...)
## S3 method for class 'BMark'
summary(object, ...)
Arguments
X |
Data matrix. Usually the number of columns (variables) is (much) larger than the number of rows (samples). |
Y |
Class indication. For classification with two or more factors
a factor; a numeric vector will be interpreted as a regression
situation, which can only be tackled by |
fmethod |
Modelling method(s) employed. The default is to use
|
type |
Whether to use coefficient size as a criterion
( |
ncomp |
Number of latent variables to use in PCR and PLS (VIP)
modelling. In function |
biom.opt |
Options for the biomarker selection - a list with
several named elements. See |
scale.p |
Scaling. This is performed individually in every crossvalidation iteration, and can have a profound effect on the results. Default: "auto" (autoscaling). Other possible choices: "none" for no scaling, "pareto" for pareto scaling, "log" and "sqrt" for log and square root scaling, respectively. |
object , x |
A BMark object. |
... |
Further arguments for modelling functions. Often used to catch unused arguments. |
Value
Function get.biom
returns an object of class "BMark", a
list containing an element
for every fmethod
that is selected, as well as an element
info
. The individual elements contain information depending on
the type chosen: for type == "coef"
, the only element returned
is a matrix containing coefficient sizes. For type == "HC"
and type == "stab"
, a list is returned containing elements
biom.indices
, and either pvals
(for type == "HC"
)
or fraction.selected
(for type == "stab"
).
Element biom.indices
contains the indices of
the selected variables, and can be extracted using function
selection
. Element pvals
contains the p values
used to perform HC thresholding; these are presented in the original
order of the variables, and can be obtained directly from e.g. t
statistics, or from permutation sampling. Element
fraction.selected
indicates in what fraction of the
stability selection iterations a particular variable has been
selected. The more often it has been selected, the more stable it is
as a biomarker. Generic function coef.biom
extracts model
coefficients, p values or stability fractions for types "coef"
,
"HC"
and "stab"
, respectively.
Author(s)
Ron Wehrens
See Also
biom.options
, get.segments
,
selection
, scalefun
Examples
## Real apple data (small set)
data(spikedApples)
apple.coef <- get.biom(X = spikedApples$dataMatrix,
Y = factor(rep(1:2, each = 10)),
ncomp = 2:3, type = "coef")
coef.sizes <- coef(apple.coef)
sapply(coef.sizes, range)
## stability-based selection
set.seed(17)
apple.stab <- get.biom(X = spikedApples$dataMatrix,
Y = factor(rep(1:2, each = 10)),
ncomp = 2:3, type = "stab")
selected.variables <- selection(apple.stab)
unlist(sapply(selected.variables, function(x) sapply(x, length)))
## Ranging from more than 70 for pcr, approx 40 for pls and student t,
## to 0-29 for the lasso
unlist(sapply(selected.variables,
function(x) lapply(x, function(xx, y) sum(xx %in% y),
spikedApples$biom)))
## TPs (stab): all find 5/5, except pcr.2 and the lasso with values for lambda
## larger than 0.0484
unlist(sapply(selected.variables,
function(x) lapply(x, function(xx, y) sum(!(xx %in% y)),
spikedApples$biom)))
## FPs (stab): PCR finds most FPs (approx. 60), other latent-variable
## methods approx 40, lasso allows for the optimal selection around
## lambda = 0.0702
## regression example
data(gasoline) ## from the pls package
gasoline.stab <- get.biom(gasoline$NIR, gasoline$octane,
fmethod = c("pcr", "pls", "lasso"), type = "stab")
## Not run:
## Same for HC-based selection
## Warning: takes a long time!
apple.HC <- get.biom(X = spikedApples$dataMatrix,
Y = factor(rep(1:2, each = 10)),
ncomp = 2:3, type = "HC")
sapply(apple.HC[names(apple.HC) != "info"],
function(x, y) sum(x$biom.indices %in% y),
spikedApples$biom)
sapply(apple.HC[names(apple.HC) != "info"],
function(x, y) sum(!(x$biom.indices %in% y)),
spikedApples$biom)
## End(Not run)