R: Gene set analysis without permutations

GSA.func {GSA}

R Documentation

Gene set analysis without permutations

Description

Determines the significance of pre-defined sets of genes with respect to an outcome variable, such as a group indicator, quantitative variable or survival time. This is the basic function called by GSA.

Usage

GSA.func(x,y, genesets, genenames,geneset.names=NULL,
 method=c("maxmean","mean","absmean"),
 resp.type=c("Quantitative","Two class unpaired","Survival","Multiclass",
             "Two class paired",  "tCorr", "taCorr" ),
censoring.status=NULL,
 first.time = TRUE, return.gene.ind = TRUE, 
ngenes = NULL, gs.mat =NULL, gs.ind = NULL,
 catalog = NULL, catalog.unique =NULL, 
s0 = NULL, s0.perc = NULL, minsize = 15, maxsize= 500, restand = TRUE,
restand.basis=c("catalog","data"))

Arguments

`x`	Data x: p by n matrix of features, one observation per column (missing values allowed)
`y`	Vector of response values: 1,2 for two class problem, or 1,2,3 ... for multiclass problem, or real numbers for quantitative or survival problems
`genesets`	Gene set collection (a list)
`genenames`	Vector of genenames in expression dataset
`geneset.names`	Optional vector of gene set names
`method`	Method for summarizing a gene set: "maxmean" (default), "mean" or "absmean"
`resp.type`	Problem type: "quantitative" for a continuous parameter; "Two class unpaired" ; "Survival" for censored survival outcome; "Multiclass" : more than 2 groups; "Two class paired" for paired outcomes, coded -1,1 (first pair), -2,2 (second pair), etc
`censoring.status`	Vector of censoring status values for survival problems, 1 mean death or failure, 0 means censored)
`first.time`	internal use
`return.gene.ind`	internal use
`ngenes`	internal use
`gs.mat`	internal use
`gs.ind`	internal use
`catalog`	internal use
`catalog.unique`	internal use
`s0`	Exchangeability factor for denominator of test statistic; Default is automatic choice
`s0.perc`	Percentile of standard deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of standard deviation values= min of sd values
`minsize`	Minimum number of genes in genesets to be considered
`maxsize`	Maximum number of genes in genesets to be considered
`restand`	Should restandardization be done? Default TRUE
`restand.basis`	What should be used to do the restandardization? The set of genes in the genesets ("catalog", the default) or the genes in the data set ("data")

Details

Carries out a Gene set analysis, computing the gene set scores. This function does not do any permutations for estimation of false discovery rates. GSA calls this function to estimate FDRs.

Value

A list with components

scores

Gene set scores for each gene set

norm.scores

Gene set scores transformed by the inverse Gaussian cdf

`mean`	Means of gene expression values for each sample
`sd`	Standard deviation of gene expression values for each sample
`gene.ind`	List indicating whch genes in each positive gene set had positive individual scores, and similarly for negative gene sets
`geneset.names`	Names of the gene sets
`nperms`	Number of permutations used
`gene.scores`	Individual gene scores (eg t-statistics for two class problem)
`s0`	Computed exchangeability factor
`s0.perc`	Computed percentile of standard deviation values
`stand.info`	Information computed used in the restandardization process
`method`	Method used (from call to GSA.func)
`call`	The call to GSA

Author(s)

Robert Tibshirani

References

Efron, B. and Tibshirani, R. On testing the significance of sets of genes. Stanford tech report rep 2006. http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf

Examples


######### two class unpaired comparison
# y must take values 1,2

set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)

u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,10),rep(2,10))


genenames=paste("g",1:1000,sep="")

#create some random gene sets
genesets=vector("list",50)
for(i in 1:50){
 genesets[[i]]=paste("g",sample(1:1000,size=30),sep="")
}
geneset.names=paste("set",as.character(1:50),sep="")

GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=genesets,  resp.type="Two class unpaired")




#to use  "real" gene set collection, we read it in from a gmt file:
# 
# geneset.obj<- GSA.read.gmt("file.gmt")
# 
# where file.gmt is a gene set collection from GSEA collection or
#  or the website http://www-stat.stanford.edu/~tibs/GSA, or one
# that you have created yourself. Then

#   GSA.func.obj<-GSA.func(x,y, genenames=genenames,
#                          genesets=geneset.obj$genesets,
#                          resp.type="Two class unpaired")
#
#

[Package GSA version 1.03.3 Index]