pcadapt {pcadapt}R Documentation

Principal Component Analysis for outlier detection

Description

pcadapt performs principal component analysis and computes p-values to test for outliers. The test for outliers is based on the correlations between genetic variation and the first K principal components. pcadapt also handles Pool-seq data for which the statistical analysis is performed on the genetic markers frequencies. Returns an object of class pcadapt.

Usage

pcadapt(
  input,
  K = 2,
  method = "mahalanobis",
  min.maf = 0.05,
  ploidy = 2,
  LD.clumping = NULL,
  pca.only = FALSE,
  tol = 1e-04
)

## S3 method for class 'pcadapt_matrix'
pcadapt(
  input,
  K = 2,
  method = c("mahalanobis", "componentwise"),
  min.maf = 0.05,
  ploidy = 2,
  LD.clumping = NULL,
  pca.only = FALSE,
  tol = 1e-04
)

## S3 method for class 'pcadapt_bed'
pcadapt(
  input,
  K = 2,
  method = c("mahalanobis", "componentwise"),
  min.maf = 0.05,
  ploidy = 2,
  LD.clumping = NULL,
  pca.only = FALSE,
  tol = 1e-04
)

## S3 method for class 'pcadapt_pool'
pcadapt(
  input,
  K = (nrow(input) - 1),
  method = "mahalanobis",
  min.maf = 0.05,
  ploidy = NULL,
  LD.clumping = NULL,
  pca.only = FALSE,
  tol
)

Arguments

input

The output of function read.pcadapt.

K

an integer specifying the number of principal components to retain.

method

a character string specifying the method to be used to compute the p-values. Two statistics are currently available, "mahalanobis", and "componentwise".

min.maf

Threshold of minor allele frequencies above which p-values are computed. Default is 0.05.

ploidy

Number of trials, parameter of the binomial distribution. Default is 2, which corresponds to diploidy, such as for the human genome.

LD.clumping

Default is NULL and doesn't use any SNP thinning. If you want to use SNP thinning, provide a named list with parameters $size and $thr which corresponds respectively to the window radius and the squared correlation threshold. A good default value would be list(size = 500, thr = 0.1).

pca.only

a logical value indicating whether PCA results should be returned (before computing any statistic).

tol

Convergence criterion of RSpectra::svds(). Default is 1e-4.

Details

First, a principal component analysis is performed on the scaled and centered genotype data. Depending on the specified method, different test statistics can be used.

mahalanobis (default): the robust Mahalanobis distance is computed for each genetic marker using a robust estimate of both mean and covariance matrix between the K vectors of z-scores.

communality: the communality statistic measures the proportion of variance explained by the first K PCs. Deprecated in version 4.0.0.

componentwise: returns a matrix of z-scores.

To compute p-values, test statistics (stat) are divided by a genomic inflation factor (gif) when method="mahalanobis". When using method="mahalanobis", the scaled statistics (chi2_stat) should follow a chi-squared distribution with K degrees of freedom. When using method="componentwise", the z-scores should follow a chi-squared distribution with 1 degree of freedom. For Pool-seq data, pcadapt provides p-values based on the Mahalanobis distance for each SNP.

Value

The returned value is an object of class pcadapt.


[Package pcadapt version 4.3.5 Index]