mahalanobisQC {ClassDiscovery} | R Documentation |
Using Mahalanobis Distance and PCA for Quality Control
Description
Compute the Mahalanobis distance of each sample from the center of an N-dimensional principal component space.
Usage
mahalanobisQC(spca, N)
Arguments
spca |
object of class |
N |
integer scalar specifying the number of components to use when assessing QC. |
Details
The theory says that, under the null hypothesis that all samples arise from the same multivariate normal distribution, the distance from the center of a D-dimensional principal component space should follow a chi-squared distribution with D degrees of freedom. This theory lets us compute p-values associated with the Mahalanobis distances for each sample. This method can be used for quality control or outlier identification.
Value
Returns a data frame containing two columns, with the rows
corresponding to the columns of the original data set on which PCA was
performed. First column is the chi-squared statistic, with N
degrees of freedom. Second column is the associated p-value.
Author(s)
Kevin R. Coombes krc@silicovore.com
References
Coombes KR, et al.
Quality control and peak finding for proteomics data collected from
nipple aspirate fluid by surface-enhanced laser desorption and ionization.
Clin Chem 2003; 49:1615-23.
Examples
library(oompaData)
data(lungData)
spca <- SamplePCA(na.omit(lung.dataset))
mc <- mahalanobisQC(spca, 2)
mc[mc$p.value < 0.01,]