R: Gene set analysis via dynamic slicing

ds_gsa {dslice}

R Documentation

Gene set analysis via dynamic slicing

Description

Gene set analysis via dynamic slicing.

Usage

  ds_gsa(expdat, geneset, label, generank, ..., lambda = 1, bycol = FALSE,
         minsize = 15, maxsize = 500, randseed = 11235, rounds = 1000)

Arguments

`expdat`	Either a character string of gene expression file name (.gct file), or an expression matrix with rownames, each row is a gene and each column is a sample.
`geneset`	Either a character string of gene set file name (.gmt file), or a list contains a vector of gene set names, a vector of gene set description and a list of gene symbols in each gene set.
`label`	Either a character string of phenotypes file (.cls file), or a list contains a vector of types of pheotype and a vector of encoded pheotypes of samples. It should match gene expression matrix.
`generank`	Either an integer vector of rank of each gene according to some statistic, or a character string naming a function which takes gene expression matrix as input and returns a vector of gene rank (not tie).
`...`	Parameters of the function specified (as a character string) by `generank`.
`lambda`	Penalty for introducing an additional slice in dynamic slicing procedure, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. `lambda` should be greater than 0.
`bycol`	Type of permutation, by row (default) or by column. Permutation by row means shuffling the gene rank. Permutation by column means shuffling pheotypes then obtain gene rank.
`minsize`	Minimum number of genes in genesets to be considered.
`maxsize`	Maximum number of genes in genesets to be considered.
`randseed`	Optional initial seed for random number generator (integer).
`rounds`	Number of permutations for estimating significant level of results.

Details

ds_gsa performs gene set analysis via dynamic slicing. It returns the DS statistics and slicing strategy of each gene set. ds_gsa does not attempt to integrate the ranking method into it. It requires ranking method or directly the gene rank as a parameter. Leaving ranking method as an optional input parameter is convenience for users who would like to use any ranking methods they want.

Value

A list with informations of gene sets whose size satisfy the minimum and maximum size thresholds. Its contains the following components:

`set_name`	A vector of gene set names.
`set_size`	A vector of gene set sizes.
`DS_value`	A vector of dynamic slicing statistic of each gene set.
`pvalue`	A vector of p-value of each gene set.
`FDR`	A vector of FDR of each gene set.
`slices`	A list of slicing strategy of each gene set. Each component is a matrix of slices.

References

Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.

Subramanian, A., Tamayo, P., Mootha, V. K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545-15550.

Benjamini, Y. and Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 1995, 57(1): 289-300.

Examples

##  Loading data from files
## Not run: 
gctpath <- "P53.gct"
clspath <- "P53.cls"
gmtpath <- "C2.gmt"
expdat <- load_gct(gctpath)
label <- load_cls(clspath)
geneset <- load_gmt(gmtpath)
fc <- function(x, label)
{
  d0 <- apply(x[,which(label == 0)], 1, mean)
  d1 <- apply(x[,which(label == 1)], 1, mean)
  d <- d1 / d0
  return(order(d))
}
ds_gsa_obj <- ds_gsa(expdat, geneset, label, "fc", lambda = 1.2, bycol = TRUE,
                     minsize = 15, maxsize = 500, randseed = 11235, rounds = 100)

## End(Not run)

[Package dslice version 1.2.2 Index]