LPS {LPS} | R Documentation |
Linear Predictor Score fitting
Description
This function trains a Linear Predictor Score model, given pre-computed coefficients. It uses data with known classes to fit the model.
It has numerous way to be called, and all the arguments are not mandatory. See the 'Examples' section.
Usage
LPS(data, coeff, response, k, threshold, formula, method = "fdr", ...)
Arguments
data |
Continuous data used to retrieve classes, as a |
coeff |
Pre-computed coefficients for the model, as returned by |
response |
Already known classes for the samples provided in |
k |
Single |
threshold |
Single |
formula |
A |
method |
Single character value, to be passed to |
... |
Further arguments are passed to |
Value
An object of (S3) class "LPS" :
coeff |
Named numeric vector, the coefficients used in the model. |
classes |
Character vector, the labels of the two groups to be predicted. |
scores |
List of two numeric vectors, training dataset scores sorted by group. |
means |
Numeric vector, score means of each group in the training dataset. |
sds |
Numeric vector, score |
ovl |
Numeric value, overlapping coefficient as returned by |
k |
Integer value, amount of features selected in the model (if relevant). |
p.threshold |
Numeric value, threshold used for feature selection (if relevant). |
p.method |
Character value, p-value correction used for feature selection (if relevant). |
Normalization
As expression values are directly used in the score, gene centering and scaling are strongly recommended. For Affymetrix raw expression values (strictly positive, linear and absolute), Wright et al. suggests a multiplicative centering on a median of 1000 followed by a log2 transformation. For log-ratio, gene centering and scaling should not be necessary, as they are naturally 0-centered.
Time efficiency
Using a numeric matrix as data
and a factor as response
is the fastest way to compute coefficients, if time consumption matters (as in cross-validation schemes). formula
is there only for consistency with R modeling functions, and to provide response
, k
or threshold
in a single way.
Author(s)
Sylvain Mareschal
References
Radmacher MD, McShane LM, Simon R. A paradigm for class prediction using gene expression profiles. J Comput Biol. 2002;9(3):505-11.
Wright G, Tan B, Rosenwald A, Hurt EH, Wiestner A, Staudt LM. A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9991-6.
Bohers E, Mareschal S, Bouzelfen A, Marchand V, Ruminy P, Maingonnat C, Menard AL, Etancelin P, Bertrand P, Dubois S, Alcantara M, Bastard C, Tilly H, Jardin F. Targetable activating mutations are very frequent in GCB and ABC diffuse large B-cell lymphoma. Genes Chromosomes Cancer. 2014 Feb;53(2):144-53.
See Also
Examples
# Data with features in columns
data(rosenwald)
group <- rosenwald.cli$group
expr <- t(rosenwald.expr)
# NA imputation (feature's mean to minimize impact)
f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
expr <- apply(expr, 2, f)
# Coefficients
coeff <- LPS.coeff(data=expr, response=group)
# 10 best features (straightforward)
m <- LPS(data=expr, coeff=coeff, response=group, k=10)
# 10 best features (formula)
### 'k' MUST be an integer, or will be understood as a 'threshold'
### Numbers are "numeric", enforce integer with "L" or "as.integer"
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~10L)
k <- as.integer(10)
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~k)
# FDR threshold
thr <- 0.01
m <- LPS(data=expr, coeff=coeff, response=group, threshold=thr)
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~0.01)
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~thr)
# Custom model
m <- LPS(data=expr, coeff=coeff[ c("27481","17013") ,], response=group, k=2)
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~`27481`+`17013`)
### Notice backticks in formula for syntactically invalid names
# Complete model
m <- LPS(data=expr, coeff=coeff, response=group, k=ncol(expr))
m <- LPS(data=expr, coeff=coeff, response=group, threshold=1)
### m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~.)
### The last is correct but (really) slow on large datasets