pcadapt {pcadapt} | R Documentation |
Principal Component Analysis for outlier detection
Description
pcadapt
performs principal component analysis and computes p-values to
test for outliers. The test for outliers is based on the correlations between
genetic variation and the first K
principal components. pcadapt
also handles Pool-seq data for which the statistical analysis is performed on
the genetic markers frequencies. Returns an object of class pcadapt
.
Usage
pcadapt(
input,
K = 2,
method = "mahalanobis",
min.maf = 0.05,
ploidy = 2,
LD.clumping = NULL,
pca.only = FALSE,
tol = 1e-04
)
## S3 method for class 'pcadapt_matrix'
pcadapt(
input,
K = 2,
method = c("mahalanobis", "componentwise"),
min.maf = 0.05,
ploidy = 2,
LD.clumping = NULL,
pca.only = FALSE,
tol = 1e-04
)
## S3 method for class 'pcadapt_bed'
pcadapt(
input,
K = 2,
method = c("mahalanobis", "componentwise"),
min.maf = 0.05,
ploidy = 2,
LD.clumping = NULL,
pca.only = FALSE,
tol = 1e-04
)
## S3 method for class 'pcadapt_pool'
pcadapt(
input,
K = (nrow(input) - 1),
method = "mahalanobis",
min.maf = 0.05,
ploidy = NULL,
LD.clumping = NULL,
pca.only = FALSE,
tol
)
Arguments
input |
The output of function |
K |
an integer specifying the number of principal components to retain. |
method |
a character string specifying the method to be used to compute
the p-values. Two statistics are currently available, |
min.maf |
Threshold of minor allele frequencies above which p-values are
computed. Default is |
ploidy |
Number of trials, parameter of the binomial distribution. Default is 2, which corresponds to diploidy, such as for the human genome. |
LD.clumping |
Default is |
pca.only |
a logical value indicating whether PCA results should be returned (before computing any statistic). |
tol |
Convergence criterion of |
Details
First, a principal component analysis is performed on the scaled and
centered genotype data. Depending on the specified method
, different
test statistics can be used.
mahalanobis
(default): the robust Mahalanobis distance is computed for
each genetic marker using a robust estimate of both mean and covariance
matrix between the K
vectors of z-scores.
communality
: the communality statistic measures the proportion of
variance explained by the first K
PCs. Deprecated in version 4.0.0.
componentwise
: returns a matrix of z-scores.
To compute p-values, test statistics (stat
) are divided by a genomic
inflation factor (gif
) when method="mahalanobis"
. When using
method="mahalanobis"
, the scaled statistics
(chi2_stat
) should follow a chi-squared distribution with K
degrees of freedom. When using method="componentwise"
, the z-scores
should follow a chi-squared distribution with 1
degree of freedom. For
Pool-seq data, pcadapt
provides p-values based on the Mahalanobis
distance for each SNP.
Value
The returned value is an object of class pcadapt
.