GSA.func {GSA} | R Documentation |
Gene set analysis without permutations
Description
Determines the significance of pre-defined sets of genes with respect to an outcome variable, such as a group indicator, quantitative variable or survival time. This is the basic function called by GSA.
Usage
GSA.func(x,y, genesets, genenames,geneset.names=NULL,
method=c("maxmean","mean","absmean"),
resp.type=c("Quantitative","Two class unpaired","Survival","Multiclass",
"Two class paired", "tCorr", "taCorr" ),
censoring.status=NULL,
first.time = TRUE, return.gene.ind = TRUE,
ngenes = NULL, gs.mat =NULL, gs.ind = NULL,
catalog = NULL, catalog.unique =NULL,
s0 = NULL, s0.perc = NULL, minsize = 15, maxsize= 500, restand = TRUE,
restand.basis=c("catalog","data"))
Arguments
x |
Data x: p by n matrix of features, one observation per column (missing values allowed) |
y |
Vector of response values: 1,2 for two class problem, or 1,2,3 ... for multiclass problem, or real numbers for quantitative or survival problems |
genesets |
Gene set collection (a list) |
genenames |
Vector of genenames in expression dataset |
geneset.names |
Optional vector of gene set names |
method |
Method for summarizing a gene set: "maxmean" (default), "mean" or "absmean" |
resp.type |
Problem type: "quantitative" for a continuous parameter; "Two class unpaired" ; "Survival" for censored survival outcome; "Multiclass" : more than 2 groups; "Two class paired" for paired outcomes, coded -1,1 (first pair), -2,2 (second pair), etc |
censoring.status |
Vector of censoring status values for survival problems, 1 mean death or failure, 0 means censored) |
first.time |
internal use |
return.gene.ind |
internal use |
ngenes |
internal use |
gs.mat |
internal use |
gs.ind |
internal use |
catalog |
internal use |
catalog.unique |
internal use |
s0 |
Exchangeability factor for denominator of test statistic; Default is automatic choice |
s0.perc |
Percentile of standard deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of standard deviation values= min of sd values |
minsize |
Minimum number of genes in genesets to be considered |
maxsize |
Maximum number of genes in genesets to be considered |
restand |
Should restandardization be done? Default TRUE |
restand.basis |
What should be used to do the restandardization? The set of genes in the genesets ("catalog", the default) or the genes in the data set ("data") |
Details
Carries out a Gene set analysis, computing the gene set scores. This function does not do any permutations for estimation of false discovery rates. GSA calls this function to estimate FDRs.
Value
A list with components
scores |
Gene set scores for each gene set |
,
norm.scores |
Gene set scores transformed by the inverse Gaussian cdf |
,
mean |
Means of gene expression values for each sample |
sd |
Standard deviation of gene expression values for each sample |
gene.ind |
List indicating whch genes in each positive gene set had positive individual scores, and similarly for negative gene sets |
geneset.names |
Names of the gene sets |
nperms |
Number of permutations used |
gene.scores |
Individual gene scores (eg t-statistics for two class problem) |
s0 |
Computed exchangeability factor |
s0.perc |
Computed percentile of standard deviation values |
stand.info |
Information computed used in the restandardization process |
method |
Method used (from call to GSA.func) |
call |
The call to GSA |
Author(s)
Robert Tibshirani
References
Efron, B. and Tibshirani, R. On testing the significance of sets of genes. Stanford tech report rep 2006. http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf
Examples
######### two class unpaired comparison
# y must take values 1,2
set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)
u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,10),rep(2,10))
genenames=paste("g",1:1000,sep="")
#create some random gene sets
genesets=vector("list",50)
for(i in 1:50){
genesets[[i]]=paste("g",sample(1:1000,size=30),sep="")
}
geneset.names=paste("set",as.character(1:50),sep="")
GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=genesets, resp.type="Two class unpaired")
#to use "real" gene set collection, we read it in from a gmt file:
#
# geneset.obj<- GSA.read.gmt("file.gmt")
#
# where file.gmt is a gene set collection from GSEA collection or
# or the website http://www-stat.stanford.edu/~tibs/GSA, or one
# that you have created yourself. Then
# GSA.func.obj<-GSA.func(x,y, genenames=genenames,
# genesets=geneset.obj$genesets,
# resp.type="Two class unpaired")
#
#