ds_gsa {dslice}R Documentation

Gene set analysis via dynamic slicing

Description

Gene set analysis via dynamic slicing.

Usage

  ds_gsa(expdat, geneset, label, generank, ..., lambda = 1, bycol = FALSE,
         minsize = 15, maxsize = 500, randseed = 11235, rounds = 1000)

Arguments

expdat

Either a character string of gene expression file name (.gct file), or an expression matrix with rownames, each row is a gene and each column is a sample.

geneset

Either a character string of gene set file name (.gmt file), or a list contains a vector of gene set names, a vector of gene set description and a list of gene symbols in each gene set.

label

Either a character string of phenotypes file (.cls file), or a list contains a vector of types of pheotype and a vector of encoded pheotypes of samples. It should match gene expression matrix.

generank

Either an integer vector of rank of each gene according to some statistic, or a character string naming a function which takes gene expression matrix as input and returns a vector of gene rank (not tie).

...

Parameters of the function specified (as a character string) by generank.

lambda

Penalty for introducing an additional slice in dynamic slicing procedure, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. lambda should be greater than 0.

bycol

Type of permutation, by row (default) or by column. Permutation by row means shuffling the gene rank. Permutation by column means shuffling pheotypes then obtain gene rank.

minsize

Minimum number of genes in genesets to be considered.

maxsize

Maximum number of genes in genesets to be considered.

randseed

Optional initial seed for random number generator (integer).

rounds

Number of permutations for estimating significant level of results.

Details

ds_gsa performs gene set analysis via dynamic slicing. It returns the DS statistics and slicing strategy of each gene set. ds_gsa does not attempt to integrate the ranking method into it. It requires ranking method or directly the gene rank as a parameter. Leaving ranking method as an optional input parameter is convenience for users who would like to use any ranking methods they want.

Value

A list with informations of gene sets whose size satisfy the minimum and maximum size thresholds. Its contains the following components:

set_name

A vector of gene set names.

set_size

A vector of gene set sizes.

DS_value

A vector of dynamic slicing statistic of each gene set.

pvalue

A vector of p-value of each gene set.

FDR

A vector of FDR of each gene set.

slices

A list of slicing strategy of each gene set. Each component is a matrix of slices.

References

Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.

Subramanian, A., Tamayo, P., Mootha, V. K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545-15550.

Benjamini, Y. and Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 1995, 57(1): 289-300.

See Also

ds_k.

Examples

##  Loading data from files
## Not run: 
gctpath <- "P53.gct"
clspath <- "P53.cls"
gmtpath <- "C2.gmt"
expdat <- load_gct(gctpath)
label <- load_cls(clspath)
geneset <- load_gmt(gmtpath)
fc <- function(x, label)
{
  d0 <- apply(x[,which(label == 0)], 1, mean)
  d1 <- apply(x[,which(label == 1)], 1, mean)
  d <- d1 / d0
  return(order(d))
}
ds_gsa_obj <- ds_gsa(expdat, geneset, label, "fc", lambda = 1.2, bycol = TRUE,
                     minsize = 15, maxsize = 500, randseed = 11235, rounds = 100)

## End(Not run)

[Package dslice version 1.2.2 Index]