R: Linear Predictor Score coefficient computation

LPS.coeff {LPS}

R Documentation

Linear Predictor Score coefficient computation

Description

As Linear Predictor Score coefficients are genuinely t statistics, this function provides a faster implementation for large datasets than using t.test.

Usage

  LPS.coeff(data, response, formula = ~1, type = c("t", "limma"),
    p.value = TRUE, log = FALSE, weighted = FALSE, ...)

Arguments

`data`	Continuous data used to retrieve classes, as a `data.frame` or `matrix`, with samples in rows and features (genes) in columns. Rows and columns should be named. `NA` values are silently ignored. Some precautions must be taken concerning data normalization, see the corresponding section in `LPS` manual page.
`response`	Already known classes for the samples provided in `data`, preferably as a two-level `factor`. Can be missing if a `formula` with a response element is provided, but this argument precedes.
`formula`	A `formula` object, describing the features to consider in `data`. The formula response element (before the "~" sign) can replace the `response` argument if it is not provided. The features can be enumerated in the variable section of the formula (after the "~" sign). "." is also handled in the usual way (all `data` columns), and "1" is a more efficient way to refer to all numeric columns of `data`.
`type`	Single character value, "t" to compute genuine t statistics (unequal variances and unpaired samples) or "limma" to use the lmFit() and eBayes() t statistics from this microarray oriented Bioconductor package.
`p.value`	Single logical value, whether to compute (two-sided) p-values or not.
`log`	Single logical value, whether to log-transform t or not (sign will be preserved). Original description of the LPS does not include log-transformation, but it may be useful to not over-weight discriminant genes in large series. Values between -1 and 1 are transformed to 0 to avoid sign shifting, as it generally comes with non significant p-values.
`weighted`	Single logical value, whether to divide t (or log-transformed t) by gene mean or not. We recommend to normalize data only by samples and use `weighted = TRUE` to include gene centering in the model, rather than centering and scaling genes by normalizing independantly each series as Wright et al. did.
`...`	Further arguments are passed to `model.frame` if `response` is missing (thus defined via `formula`). `subset` and `na.action` may be particularly useful for cross-validation schemes, see `model.frame.default` for details. `subset` is always handled but masked in "..." for compatibility reasons.

Value

Always returns a row named numeric matrix, with a "t" column holding statistics computed. If p.value is TRUE, a second "p.value" column is added.

Note

Using a numeric matrix as data and a factor as response is the fastest way to compute coefficients, if time consumption matters (as in cross-validation schemes). formula was added only for consistency with other R modeling functions, and eventually to subset features to compute coefficients for.

Author(s)

Sylvain Mareschal

References

http://www.bioconductor.org/packages/release/bioc/html/limma.html

Examples

  # Data with features in columns
  data(rosenwald)
  group <- rosenwald.cli$group
  expr <- t(rosenwald.expr)
  
  
  # All features, all samples
  k <- LPS.coeff(data=expr, response=group)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr))
  ### LPS.coeff(formula=group~., data=as.data.frame(expr), na.action=na.pass)
  ### The last is correct but (really) slow on large datasets
  
  # Feature subset, all samples
  k <- LPS.coeff(data=expr[, c("27481","17013") ], response=group)
  k <- LPS.coeff(formula=group~`27481`+`17013`, data=as.data.frame(expr))
  ### Notice backticks in formula for syntactically invalid names
  
  # All features, sample subset
  training <- rosenwald.cli$set == "Training"
  ### training <- sample.int(nrow(expr), 10)
  ### training <- which(rosenwald.cli$set == "Training")
  ### training <- rownames(subset(rosenwald.cli, set == "Training"))
  k <- LPS.coeff(data=expr, response=group, subset=training)
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), subset=training)

  # NA handling by model.frame()
  k <- LPS.coeff(formula=group~1, data=as.data.frame(expr), na.action=na.omit)

[Package LPS version 1.0.16 Index]