eigenstrat {AssocTests}R Documentation

EIGENSTRAT for correcting for population stratification

Description

Find the eigenvectors of the similarity matrix among the subjects used for correcting for population stratification in the population-based genetic association studies.

Usage

eigenstrat(
  genoFile,
  outFile.Robj = "out.list",
  outFile.txt = "out.txt",
  rm.marker.index = NULL,
  rm.subject.index = NULL,
  miss.val = 9,
  num.splits = 10,
  topK = NULL,
  signt.eigen.level = 0.01,
  signal.outlier = FALSE,
  iter.outlier = 5,
  sigma.thresh = 6
)

Arguments

genoFile

a txt file containing the genotypes (0, 1, 2, or 9). The element of the file in Row i and Column j represents the genotype at the ith marker of the jth subject. 0, 1, and 2 denote the number of risk alleles, and 9 (default) is for the missing genotype.

outFile.Robj

the name of an R object for saving the list of the results which is the same as the return value of this function. The default is "out.list".

outFile.txt

a txt file for saving the eigenvectors corresponding to the top significant eigenvalues.

rm.marker.index

a numeric vector for the indices of the removed markers. The default is NULL.

rm.subject.index

a numeric vector for the indices of the removed subjects. The default is NULL.

miss.val

the number representing the missing data in the input data. The default is 9. The element 9 for the missing data in the genoFile should be changed according to the value of miss.val.

num.splits

the number of groups into which the markers are split. The default is 10.

topK

the number of eigenvectors to return. If NULL, it is calculated by the Tracy-Widom test. The default is NULL.

signt.eigen.level

a numeric value which is the significance level of the Tracy-Widom test. It should be 0.05, 0.01, 0.005, or 0.001. The default is 0.01.

signal.outlier

logical. If TRUE, delete the outliers of the subjects; otherwise, do not search for the outliers. The default is FALSE.

iter.outlier

a numeric value that is the iteration time for finding the outliers of the subjects. The default is 5.

sigma.thresh

a numeric value that is the lower limit for eliminating the outliers. The default is 6.

Details

Suppose that a total of n cases and controls are randomly enrolled in the source population and a panel of m single-nucleotide polymorphisms are genotyped. The genotype at a marker locus is coded as 0, 1, or 2, with the value corresponding to the copy number of risk alleles. All the genotypes are given in the form of a m*n matrix, in which the element in the ith row and the jth column represents the genotype of the jth subject at the ith marker. This function calculates the top eigenvectors or the eigenvectors with significant eigenvalues of the similarity matrix among the subjects to infer the potential population structure. See also tw.

Value

eigenstrat returns a list, which contains the following components:

num.markers the number of markers excluding the removed markers.
num.subjects the number of subjects excluding the outliers.
rm.marker.index the indices of the removed markers.
rm.subject.index the indices of the removed subjects.
TW.level the significance level of the Tracy-Widom test.
signal.outlier dealing with the outliers in the subjects or not.
iter.outlier the iteration time for finding the outliers.
sigma.thresh the lower limit for eliminating the outliers.
num.outliers the number of outliers.
outliers.index the indices of the outliers.
num.used.subjects the number of the used subjects.
used.subjects.index the indices of the used subjects.
similarity.matrix the similarity matrix among the subjects.
eigenvalues the eigenvalues of the similarity matrix.
eigenvectors the eigenvectors corresponding to the eigenvalues.
topK the number of significant eigenvalues.
TW.stat the observed values of the Tracy-Widom statistics.
topK.eigenvalues the top eigenvalues.
topK.eigenvectors the eigenvectors corresponding to the top eigenvalues.
runtime the running time of this function.

Author(s)

Lin Wang, Wei Zhang, and Qizhai Li.

References

Lin Wang, Wei Zhang, and Qizhai Li. AssocTests: An R Package for Genetic Association Studies. Journal of Statistical Software. 2020; 94(5): 1-26.

AL Price, NJ Patterson, RM Plenge, ME Weinblatt, NA Shadick, and D Reich. Principal Components Analysis Corrects for Stratification in Genome-Wide Association Studies. Nature Genetics. 2006; 38(8): 904-909.

N Patterson, AL Price, and D Reich. Population Structure and Eigenanalysis. PloS Genetics. 2006; 2(12): 2074-2093.

CA Tracy and H Widom. Level-Spacing Distributions and the Airy Kernel. Communications in Mathematical Physics. 1994; 159(1): 151-174.

Examples

eigenstratG.eg <- matrix(rbinom(3000, 2, 0.5), ncol = 30)
write.table(eigenstratG.eg, file = "eigenstratG.eg.txt", quote = FALSE,
            sep = "", row.names = FALSE, col.names = FALSE)
eigenstrat(genoFile = "eigenstratG.eg.txt", outFile.Robj = "eigenstrat.result.list",
             outFile.txt = "eigenstrat.result.txt", rm.marker.index = NULL,
             rm.subject.index = NULL, miss.val = 9, num.splits = 10,
             topK = NULL, signt.eigen.level = 0.01, signal.outlier = FALSE,
             iter.outlier = 5, sigma.thresh = 6)
file.remove("eigenstratG.eg.txt", "eigenstrat.result.list", "eigenstrat.result.txt")

[Package AssocTests version 1.0-1 Index]