| pcadapt {pcadapt} | R Documentation |
Principal Component Analysis for outlier detection
Description
pcadapt performs principal component analysis and computes p-values to
test for outliers. The test for outliers is based on the correlations between
genetic variation and the first K principal components. pcadapt
also handles Pool-seq data for which the statistical analysis is performed on
the genetic markers frequencies. Returns an object of class pcadapt.
Usage
pcadapt(
input,
K = 2,
method = "mahalanobis",
min.maf = 0.05,
ploidy = 2,
LD.clumping = NULL,
pca.only = FALSE,
tol = 1e-04
)
## S3 method for class 'pcadapt_matrix'
pcadapt(
input,
K = 2,
method = c("mahalanobis", "componentwise"),
min.maf = 0.05,
ploidy = 2,
LD.clumping = NULL,
pca.only = FALSE,
tol = 1e-04
)
## S3 method for class 'pcadapt_bed'
pcadapt(
input,
K = 2,
method = c("mahalanobis", "componentwise"),
min.maf = 0.05,
ploidy = 2,
LD.clumping = NULL,
pca.only = FALSE,
tol = 1e-04
)
## S3 method for class 'pcadapt_pool'
pcadapt(
input,
K = (nrow(input) - 1),
method = "mahalanobis",
min.maf = 0.05,
ploidy = NULL,
LD.clumping = NULL,
pca.only = FALSE,
tol
)
Arguments
input |
The output of function |
K |
an integer specifying the number of principal components to retain. |
method |
a character string specifying the method to be used to compute
the p-values. Two statistics are currently available, |
min.maf |
Threshold of minor allele frequencies above which p-values are
computed. Default is |
ploidy |
Number of trials, parameter of the binomial distribution. Default is 2, which corresponds to diploidy, such as for the human genome. |
LD.clumping |
Default is |
pca.only |
a logical value indicating whether PCA results should be returned (before computing any statistic). |
tol |
Convergence criterion of |
Details
First, a principal component analysis is performed on the scaled and
centered genotype data. Depending on the specified method, different
test statistics can be used.
mahalanobis (default): the robust Mahalanobis distance is computed for
each genetic marker using a robust estimate of both mean and covariance
matrix between the K vectors of z-scores.
communality: the communality statistic measures the proportion of
variance explained by the first K PCs. Deprecated in version 4.0.0.
componentwise: returns a matrix of z-scores.
To compute p-values, test statistics (stat) are divided by a genomic
inflation factor (gif) when method="mahalanobis". When using
method="mahalanobis", the scaled statistics
(chi2_stat) should follow a chi-squared distribution with K
degrees of freedom. When using method="componentwise", the z-scores
should follow a chi-squared distribution with 1 degree of freedom. For
Pool-seq data, pcadapt provides p-values based on the Mahalanobis
distance for each SNP.
Value
The returned value is an object of class pcadapt.