EstimateLPCFDR {lpc}R Documentation

Estimate LPC False Discovery Rates.

Description

An estimated false discovery rate for each gene is computed, based on the LPC scores. Estimated false discovery rates based on T scores (Cox, regression coefficient, or two-sample t-statistic) are also given.

Usage

EstimateLPCFDR(x, y, type, nreps=100,soft.thresh=NULL, censoring.status=NULL)

Arguments

x

The matrix of gene expression values; pxn where n is the number of observations and p is the number of genes.

y

A vector of length n, with an outcome for each observation. For two-class outcome, y's elements are 1 or 2. For quantitative outcome, y's elements are real-valued. For survival data, y indicates the survival time. For multiclass, y is coded as 1,2,3,...

type

One of "regression" (for a quantitative outcome), "two class", "multiclass", or "survival".

nreps

The number of training/test set splits used to estimate LPC's false discovery rates. Default is 100.

soft.thresh

The value of the soft threshold to be used in the L1 regression of the scores onto the eigenarrays. This can be the value of the soft-threshold chosen adaptively by LPC function when it is run on the data; if not entered by the user, then that adaptive value is computed.

censoring.status

For survival outcome only, a vector of length n which takes on values 0 or 1 depending on whether the observation is complete or censored.

Details

Details of false discovery rate estimation for LPC can be found in the paper: http://www-stat.stanford.edu/~dwitten \

As explained in the paper, FDR of LPC is estimated by computing the FDR of simpler scores (Cox for survival outcome, standardized regression coefficients for regression outcome, and two-sample t-statistic for two-class outcome) and then estimating the difference between the FDR of LPC and the FDR of these simpler scores.

Value

fdrlpc

A vector of length p (equal to the number of genes), with the LPC false discovery rate for each gene.

fdrt

A vector of length p (equal to the number of genes), with the T false discovery rate for each gene.

fdrdiff

The FDR of T minus the FDR of LPC. This is approximately equal to fdrt-fdrlpc, with the caveat that the FDR of LPC (computed via this difference) must be between 0 and 1.

pi0

The fraction of genes that are believed to be null.

soft.thresh

The value of the soft threshold used in the L1 constraint for LPC.

call

The function call made.

Author(s)

Daniela M. Witten and Robert Tibshirani

References

Witten, D.M. and Tibshirani, R. (2008) Testing significance of features by lassoed principal components. Annals of Applied Statistics. http://www-stat.stanford.edu/~dwitten

Examples

### Uncomment to run....

#set.seed(2)
#n <- 40 # 40 samples
#p <- 1000 # 1000 genes
#x <- matrix(rnorm(n*p), nrow=p) # make 40x1000 gene expression matrix
#y <-  rnorm(n) # quantitative outcome
## make first 50 genes differentially-expressed
#x[1:25,y<(-.5)] <- x[1:25,y<(-.5)]+ 1.5
#x[26:50,y<0] <- x[26:50,y<0] - 1.5
## compute LPC and T scores for each gene
#lpc.obj <- LPC(x,y, type="regression")
## Look at plot of Predictive Advantage
#pred.adv <-
#PredictiveAdvantage(x,y,type="regression",soft.thresh=lpc.obj$soft.thresh)
## Estimate FDRs for LPC and T scores
#fdr.lpc.out <-
#EstimateLPCFDR(x,y,type="regression",soft.thresh=lpc.obj$soft.thresh,nreps=50)
## Estimate FDRs for T scores only. This is quicker than computing FDRs
##    for LPC scores, and should be used when only T FDRs are needed. If we
##    started with the same random seed, then EstimateTFDR and EstimateLPCFDR
##    would give same T FDRs.
#fdr.t.out <- EstimateTFDR(x,y, type="regression")
## print out results of main function
#lpc.obj
## print out info about T FDRs
#fdr.t.out
## print out info about LPC FDRs
#fdr.lpc.out
## Compare FDRs for T and LPC on 6% of genes. In this example, LPC has
##    lower FDR.
#PlotFDRs(fdr.lpc.out,frac=.06)
## Print out names of 20 genes with highest LPC scores, along with their
##    LPC and T scores.
#PrintGeneList(lpc.obj,numGenes=20)
## Print out names of 20 genes with highest LPC scores, along with their
##    LPC and T scores and their FDRs for LPC and T.
#PrintGeneList(lpc.obj,numGenes=20,lpcfdr.out=fdr.lpc.out)

# Now, repeating everything that we did before, but using a
#   **survival** outcome
# NOT RUNNING DUE TO TIME CONSTRAINTS -- UNCOMMENT TO RUN


#set.seed(2)
#n <- 40 # 40 samples
#p <- 1000 # 1000 genes
#x <- matrix(rnorm(n*p), nrow=p) # make 40x1000 gene expression matrix
#y <-  rnorm(n) + 10 # survival times; must be positive
## censoring outcome: 0 or 1
#cens <- rep(1,40) # Assume all observations are complete
## make first 50 genes differentially-expressed
#x[1:25,y<9.5] <- x[1:25,y<9.5]+ 1.5
#x[26:50,y<10] <- x[26:50,y<10] - 1.5
#lpc.obj <- LPC(x,y, type="survival", censoring.status=cens)
## Look at plot of Predictive Advantage
#pred.adv <- PredictiveAdvantage(x,y,type="survival",soft.thresh=lpc.obj$soft.thresh,
#censoring.status=cens)
## Estimate FDRs for LPC scores and T scores
#fdr.lpc.out <- EstimateLPCFDR(x,y,type="survival",
#soft.thresh=lpc.obj$soft.thresh,nreps=20,censoring.status=cens)
## Estimate FDRs for T scores only. This is quicker than computing FDRs
##    for LPC scores, and should be used when only T FDRs are needed. If we
##    started with the same random seed, then EstimateTFDR and EstimateLPCFDR
##    would give same T FDRs.
#fdr.t.out <- EstimateTFDR(x,y, type="survival", censoring.status=cens)
## print out results of main function
#lpc.obj
## print out info about T FDRs
#fdr.t.out
## print out info about LPC FDRs
#fdr.lpc.out
## Compare FDRs for T and LPC scores on 10% of genes.
#PlotFDRs(fdr.lpc.out,frac=.1)
## Print out names of 20 genes with highest LPC scores, along with their
##    LPC and T scores.
#PrintGeneList(lpc.obj,numGenes=20)
## Print out names of 20 genes with highest LPC scores, along with their
##    LPC and T scores and their FDRs for LPC and T.
#PrintGeneList(lpc.obj,numGenes=20,lpcfdr.out=fdr.lpc.out)


[Package lpc version 1.0.2.1 Index]