ds_gsa {dslice} | R Documentation |
Gene set analysis via dynamic slicing
Description
Gene set analysis via dynamic slicing.
Usage
ds_gsa(expdat, geneset, label, generank, ..., lambda = 1, bycol = FALSE,
minsize = 15, maxsize = 500, randseed = 11235, rounds = 1000)
Arguments
expdat |
Either a character string of gene expression file name (.gct file), or an expression matrix with rownames, each row is a gene and each column is a sample. |
geneset |
Either a character string of gene set file name (.gmt file), or a list contains a vector of gene set names, a vector of gene set description and a list of gene symbols in each gene set. |
label |
Either a character string of phenotypes file (.cls file), or a list contains a vector of types of pheotype and a vector of encoded pheotypes of samples. It should match gene expression matrix. |
generank |
Either an integer vector of rank of each gene according to some statistic, or a character string naming a function which takes gene expression matrix as input and returns a vector of gene rank (not tie). |
... |
Parameters of the function specified (as a character string) by |
lambda |
Penalty for introducing an additional slice in dynamic slicing procedure, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. |
bycol |
Type of permutation, by row (default) or by column. Permutation by row means shuffling the gene rank. Permutation by column means shuffling pheotypes then obtain gene rank. |
minsize |
Minimum number of genes in genesets to be considered. |
maxsize |
Maximum number of genes in genesets to be considered. |
randseed |
Optional initial seed for random number generator (integer). |
rounds |
Number of permutations for estimating significant level of results. |
Details
ds_gsa
performs gene set analysis via dynamic slicing. It returns the DS statistics and slicing strategy of each gene set.
ds_gsa
does not attempt to integrate the ranking method into it. It requires ranking method or directly the gene rank as a parameter. Leaving ranking method as an optional input parameter is convenience for users who would like to use any ranking methods they want.
Value
A list with informations of gene sets whose size satisfy the minimum and maximum size thresholds. Its contains the following components:
set_name |
A vector of gene set names. |
set_size |
A vector of gene set sizes. |
DS_value |
A vector of dynamic slicing statistic of each gene set. |
pvalue |
A vector of p-value of each gene set. |
FDR |
A vector of FDR of each gene set. |
slices |
A list of slicing strategy of each gene set. Each component is a matrix of slices. |
References
Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.
Subramanian, A., Tamayo, P., Mootha, V. K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545-15550.
Benjamini, Y. and Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 1995, 57(1): 289-300.
See Also
ds_k
.
Examples
## Loading data from files
## Not run:
gctpath <- "P53.gct"
clspath <- "P53.cls"
gmtpath <- "C2.gmt"
expdat <- load_gct(gctpath)
label <- load_cls(clspath)
geneset <- load_gmt(gmtpath)
fc <- function(x, label)
{
d0 <- apply(x[,which(label == 0)], 1, mean)
d1 <- apply(x[,which(label == 1)], 1, mean)
d <- d1 / d0
return(order(d))
}
ds_gsa_obj <- ds_gsa(expdat, geneset, label, "fc", lambda = 1.2, bycol = TRUE,
minsize = 15, maxsize = 500, randseed = 11235, rounds = 100)
## End(Not run)