swamp-package {swamp} | R Documentation |
Visualization, Analysis and Adjustment of High-Dimensional Data in Respect to Sample Annotations
Description
The package contains functions to connect the structure of the data with sample information. Three types of analyses are covered: 1. linear models of principal components. 2. hierarchical clustering analysis. 3. distribution of associations between individual features and sample annotations. Additionally, the inter-relation between sample annotations can be analyzed. We include methods for batch adjustment by a. median-centering, b. linear models and c. empirical bayes (combat). A method to remove principal components from the data is also included.
Details
The DESCRIPTION file:
Package: | swamp |
Type: | Package |
Title: | Visualization, Analysis and Adjustment of High-Dimensional Data in Respect to Sample Annotations |
Version: | 1.5.1 |
biocViews: | |
Depends: | impute, amap, gplots, MASS |
Imports: | methods |
Author: | Martin Lauss |
Maintainer: | Martin Lauss <martin.lauss@med.lu.se> |
Description: | Collection of functions to connect the structure of the data with the information on the samples. Three types of associations are covered: 1. linear model of principal components. 2. hierarchical clustering analysis. 3. distribution of features-sample annotation associations. Additionally, the inter-relation between sample annotations can be analyzed. Simple methods are provided for the correction of batch effects and removal of principal components. |
License: | GPL (>= 2) |
LazyLoad: | yes |
Packaged: | Wed Dec 06 09:58:46 2019; lauss |
RoxygenNote: | 7.0.2 |
Index of help topics:
adjust.linearmodel Batch adjustment using a linear model combat ComBat algorithm to combine batches. confounding Heatmap of interrelation of sample annotations corrected.p Correction of p-values for associations between features and sample annotation dense.plot Density plots of feature associations in observed and permuted data feature.assoc Associations of the features to a sample annotation in observed and reshuffled data. hca.plot Dendrogram with according sample annotations hca.test Tests for annotation differences among sample clusters kill.pc Removes principal components from a data matrix prince Linear models of prinicipal conponents dependent on sample annotations prince.plot Heatmap of the associations between principal components and sample annotations prince.var.plot ScreePlot of the data variation covered by the principal components quickadjust.ref Batch adjustment by median-scaling to a reference batch quickadjust.zero Batch adjustment by median-centering swamp-package Visualization, Analysis and Adjustment of High-Dimensional Data in Respect to Sample Annotations
This package aims to find the associations between a high-dimensional data matrix, typically obtained form high-throughput analysis, to the sample annotations, be they technical (e.g. batch surrogates) or biological information. High-dimensional data usually has a much larger number of features than samples. This requires specific analysis to see "how the data looks like" and in which way the technical and biological information on the samples is reflected in the dataset. Sample annotations can be analyzed for their association to principal componenents (prince(), prince.plot(), prince.var.plot()) as well as clusters from HCA (hca.plot(), hca.test()). The distribution of association of sample annotations to the features can be plotted and analysed (feature.assoc(), dense.plot(), corrected.p()). Batch surrogate variables might be associated with biological annotations, hence making batch adjustment of the data problematic. The associations between the sample annotations can be calculated and plotted (confounding()). If unwanted batch effects have been identified in the previous steps, two simple methods are provided to adjust the data (quickadjust.zero(), quickadjust.ref()). Another batch adjustment uses linear models, with the advantage to remove numerical variables and several variables at once (adjust.linearmodel()). In addition, the popular ComBat batch adjustment has been adopted for the package to combine batches of small sample size (combat()). Principal components confounded by batch variables can be reomved from the data (kill.pc()). To use the functions of this package, the data matrix has to be a numeric matrix and the sample annotations have to be a data.frame with numeric or factors with 2 or more levels. NAs are allowed in both the data matrix and the annotation dataframe, except for the functions adjust.linearmodel() and combat().
Author(s)
Martin Lauss
Maintainer: Martin Lauss <martin.lauss@med.lu.se>
Examples
##### first you need a dataset (matrix)
set.seed(100)
g<-matrix(nrow=1000,ncol=50,rnorm(1000*50),dimnames=list(paste("Feature",1:1000),
paste("Sample",1:50)))
g[1:100,26:50]<-g[1:100,26:50]+1 ## lets put in some larger values
set.seed(200)
##### second you need sample annotations (data.frame)
o<-data.frame(Factor1=factor(c(rep("A",25),rep("B",25))),
Factor2=factor(rep(c("A","B"),25)),
Numeric1=rnorm(50),row.names=colnames(g))
# Level "B" of Factor 1 marks the samples with the larger values
##### perform the functions
## principal components
res1<-prince(g,o,top=10,permute=TRUE)
prince.plot(prince=res1)
# to plot p values of linear models: lm(principal components ~ sapmle annotations).
# to see if the variation in the data is associated with sample annotations.
res2<-prince.var.plot(g,show.top=50,npermute=10)
# to see how many principal components carry informative variation
## hierarchical clustering analysis
hca.plot(g,o)
# to show a dendrogram with sample annotations below
res3<-hca.test(g,o,dendcut=2,test="fisher")
# to test if the major clusters show differences in sample annotations
## feature associations
res4a<-feature.assoc(g,o$Factor1,method="correlation")
# to calculate correlation between one sample annotation and each feature
res4b<-feature.assoc(g,o$Factor1,method="t.test",g1=res4a$permuted.data)
res4c<-feature.assoc(g,o$Factor1,method="AUC",g1=res4a$permuted.data)
dense.plot(res4a)
# to plot the distribution of correlations in the observed data
# in comparison to permuted data
dense.plot(res4b)
dense.plot(res4c)
res5<-corrected.p(res4a)
# to correct for multiple testing and find out how many features are
# significantly associated to the sample annotation
names(which(res5$padjust<0.05))
names(which(res5$adjust.permute<0.05))
names(which(res5$adjust.rank<0.05))
## associations between sample annotations
res4<-confounding(o,method="fisher")
# to see how biological and technical annotations are inter-related
## adjustment for batch effects
gadj1<-quickadjust.zero(g,o$Factor1)
# to adjust for batches (o$Factor1)
# using median centering of the features for each batch
prince.plot(prince(gadj1,o,top=10))
gadj2<-quickadjust.ref(g,o$Factor1,"B")
# to adjust for batches (o$Factor)
# by adjusting the median of the features to the median of a reference batch.
prince.plot(prince(gadj2$adjusted.data,o,top=10))
gadj3<-kill.pc(g,pc=1)
# to remove one or more principal components (here pc1) from the data
prince.plot(prince(gadj3,o,top=10))
lin1<-adjust.linearmodel(g,o$Numeric1)
# to adjust for numerical variables using a linear model
prince.plot(prince(lin1,o,top=10))
com1<-combat(g,o$Factor1,batchcolumn=1)
# to use the empirical Bayes framework of ComBat, popular for small-sample sizes
prince.plot(prince(com1,o,top=10))