mahalanobisQC {ClassDiscovery} | R Documentation |
Compute the Mahalanobis distance of each sample from the center of an N-dimensional principal component space.
mahalanobisQC(spca, N)
spca |
object of class |
N |
integer scalar specifying the number of components to use when assessing QC. |
The theory says that, under the null hypothesis that all samples arise from the same multivariate normal distribution, the distance from the center of a D-dimensional principal component space should follow a chi-squared distribution with D degrees of freedom. This theory lets us compute p-values associated with the Mahalanobis distances for each sample. This method can be used for quality control or outlier identification.
Returns a data frame containing two columns, with the rows
corresponding to the columns of the original data set on which PCA was
performed. First column is the chi-squared statistic, with N
degrees of freedom. Second column is the associated p-value.
Kevin R. Coombes krc@silicovore.com
Coombes KR, et al.
Quality control and peak finding for proteomics data collected from
nipple aspirate fluid by surface-enhanced laser desorption and ionization.
Clin Chem 2003; 49:1615-23.
library(oompaData)
data(lungData)
spca <- SamplePCA(na.omit(lung.dataset))
mc <- mahalanobisQC(spca, 2)
mc[mc$p.value < 0.01,]