snp.pca {ASRgenomics} | R Documentation |
Generates a PCA and summary statistics from a given molecular matrix
for population structure. Matrix
provided is of full form (n \times p
), with n individuals and p markers. Individual and
marker names are assigned to rownames
and colnames
, respectively.
SNP data is coded as 0, 1, 2 (integers or decimal numbers). Missing values are
not accepted and these need to be imputed (see function qc.filtering()
for implementing mean imputation). There is additional output such as plots and
other data frames
to be used on other downstream analyses (such as GWAS).
snp.pca(M = NULL, label = FALSE, ncp = 10, groups = NULL, ellipses = FALSE)
M |
A matrix with SNP data of full form ( |
label |
If |
ncp |
The number of PC dimensions to be shown in the screeplot, and to provide
in the output data frame (default = |
groups |
Specifies a vector of class factor that will be used to define different
colors for individuals in the PCA plot. It must be presented in the same order as the individuals
in the molecular |
ellipses |
If |
It calls function prcomp()
to generate the PCA and the
factoextra
R package to extract and visualize results.
Methodology uses normalized allele frequencies as proposed by Patterson et al. (2006).
A list with the following four elements:
eigenvalues
: a data frame with the eigenvalues and its variances associated with each dimension
including only the first ncp
dimensions.
pca.scores
: a data frame with scores (rotated observations on the new components) including
only the first ncp
dimensions.
plot.pca
: a scatterplot with the first two-dimensions (PC1 and PC2) and their scores.
plot.scree
: a barchart with the percentage of variances explained by the ncp
dimensions.
Patterson N., Price A.L., and Reich, D. 2006. Population structure and eigenanalysis. PLoS Genet 2(12):e190. doi:10.1371/journal.pgen.0020190
# Perform the PCA.
SNP_pca <- snp.pca(M = geno.apple, ncp = 10)
ls(SNP_pca)
SNP_pca$eigenvalues
head(SNP_pca$pca.scores)
SNP_pca$plot.pca
SNP_pca$plot.scree
# PCA plot by family (17 groups).
grp <- as.factor(pheno.apple$Family)
SNP_pca_grp <- snp.pca(M = geno.apple, groups = grp, label = FALSE)
SNP_pca_grp$plot.pca