R: Tests for annotation differences among sample clusters

hca.test {swamp}

R Documentation

Tests for annotation differences among sample clusters

Description

The main clusters of a dendrogram are tested for different patient annotations.

Usage

hca.test(g, o, dendcut = 2, method = "correlation", link = "ward", 
         test = "chisq", workspace = 2e+07)

Arguments

`g`	the input data in form of a matrix with features as rows and samples as columns.
`o`	the corresponding sample annotations in the form of a data.frame. Sample annotation as a single vector is allowed and will be transformed to a data.frame. rownames (o) must be identical to colnames (g). o can contain factors and numeric variables. No character variables are allowed. NAs are allowed.
`dendcut`	the number of clusters to cut the dendrogram tree (using cutree()). default=2.
`method`	the distance method for the clustering. default="correlation". hcluster from the package amap is used and method must be one of "euclidean", "maximum", "manhattan", "canberra" "binary" "pearson", "correlation", "spearman" or "kendall".
`link`	the agglomeration principle for the clustering. default="ward". hcluster from the package amap is used and link must be one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid".
`test`	the test to use for the annotations that are factors. this can be either "fisher" or "chisq" to use fisher.test() or chisq.test(), respectively. default = "chisq". However fisher.test is preferable as it is an exact test. Note that fisher.test() is computationally expensive and can cause R to crash.
`workspace`	workspace to use if test="fisher"

Details

The function clusters the samples using amap and then cuts the dendrogram into a specified number of clusters. The obtained sample clusters are tested for differences in sample annotations. fisher.test() or chisq.test() is used if the annotation is a factor, lm(annotation~clusters) is used for numeric annotations. The p-values for the dependence of sample annotation and sample clusters are returned.

Value

a list with components

`p.values`	a numeric vector containing the p.values for the annotation variable.
`classes`	the classes of the annotation variables in o.

Note

requires the package amap

Author(s)

Martin Lauss

Examples

## data as a matrix
set.seed(100)
g<-matrix(nrow=1000,ncol=50,rnorm(1000*50),dimnames=list(paste("Feature",1:1000),
          paste("Sample",1:50)))
g[1:100,26:50]<-g[1:100,26:50]+1 # the first 100 features show
# higher values in the samples 26:50
## patient annotations as a data.frame, annotations should be numbers and factor
# but not characters.
## rownames have to be the same as colnames of the data matrix 
set.seed(200)
o<-data.frame(Factor1=factor(c(rep("A",25),rep("B",25))),
              Factor2=factor(rep(c("A","B"),25)),
              Numeric1=rnorm(50),row.names=colnames(g))

# perform the test for the main 2 clusters
res3<-hca.test(g,o,dendcut=2,test="fisher") 
        # use test="chisq" for large ncol(g) to avoid crash of R
res3$p.values

[Package swamp version 1.5.1 Index]