LPS.coeff {LPS}R Documentation

Linear Predictor Score coefficient computation

Description

As Linear Predictor Score coefficients are genuinely t statistics, this function provides a faster implementation for large datasets than using t.test.

Usage

  LPS.coeff(data, response, formula = ~1, type = c("t", "limma"),
    p.value = TRUE, log = FALSE, weighted = FALSE, ...)

Arguments

data

Continuous data used to retrieve classes, as a data.frame or matrix, with samples in rows and features (genes) in columns. Rows and columns should be named. NA values are silently ignored. Some precautions must be taken concerning data normalization, see the corresponding section in LPS manual page.

response

Already known classes for the samples provided in data, preferably as a two-level factor. Can be missing if a formula with a response element is provided, but this argument precedes.

formula

A formula object, describing the features to consider in data. The formula response element (before the "~" sign) can replace the response argument if it is not provided. The features can be enumerated in the variable section of the formula (after the "~" sign). "." is also handled in the usual way (all data columns), and "1" is a more efficient way to refer to all numeric columns of data.

type

Single character value, "t" to compute genuine t statistics (unequal variances and unpaired samples) or "limma" to use the lmFit() and eBayes() t statistics from this microarray oriented Bioconductor package.

p.value

Single logical value, whether to compute (two-sided) p-values or not.

log

Single logical value, whether to log-transform t or not (sign will be preserved). Original description of the LPS does not include log-transformation, but it may be useful to not over-weight discriminant genes in large series. Values between -1 and 1 are transformed to 0 to avoid sign shifting, as it generally comes with non significant p-values.

weighted

Single logical value, whether to divide t (or log-transformed t) by gene mean or not. We recommend to normalize data only by samples and use weighted = TRUE to include gene centering in the model, rather than centering and scaling genes by normalizing independantly each series as Wright et al. did.

...

Further arguments are passed to model.frame if response is missing (thus defined via formula). subset and na.action may be particularly useful for cross-validation schemes, see model.frame.default for details. subset is always handled but masked in "..." for compatibility reasons.

Value

Always returns a row named numeric matrix, with a "t" column holding statistics computed. If p.value is TRUE, a second "p.value" column is added.

Note

Using a numeric matrix as data and a factor as response is the fastest way to compute coefficients, if time consumption matters (as in cross-validation schemes). formula was added only for consistency with other R modeling functions, and eventually to subset features to compute coefficients for.

Author(s)

Sylvain Mareschal

References

http://www.bioconductor.org/packages/release/bioc/html/limma.html

See Also

LPS

Examples

  # Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  
  # All features, all samples
  k <- LPS.coeff(data=expr, response=group)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr))
  ### LPS.coeff(formula=group~., data=as.data.frame(expr), na.action=na.pass)
  ### The last is correct but (really) slow on large datasets
  
  # Feature subset, all samples
  k <- LPS.coeff(data=expr[, c("27481","17013") ], response=group)
  k <- LPS.coeff(formula=group~`27481`+`17013`, data=as.data.frame(expr))
  ### Notice backticks in formula for syntactically invalid names
  
  # All features, sample subset
  training <- rosenwald.cli$set == "Training"
  ### training <- sample.int(nrow(expr), 10)
  ### training <- which(rosenwald.cli$set == "Training")
  ### training <- rownames(subset(rosenwald.cli, set == "Training"))
  k <- LPS.coeff(data=expr, response=group, subset=training)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), subset=training)

  # NA handling by model.frame()
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), na.action=na.omit)

[Package LPS version 1.0.16 Index]