popkin {popkin} | R Documentation |
Estimate kinship from a genotype matrix and subpopulation assignments
Description
Given the biallelic genotypes of n
individuals, this function returns the n
-by-n
kinship matrix such that the kinship estimate between the most distant subpopulations is zero on average (this sets the ancestral population to the most recent common ancestor population).
Usage
popkin(
X,
subpops = NULL,
n = NA,
loci_on_cols = FALSE,
mean_of_ratios = FALSE,
mem_factor = 0.7,
mem_lim = NA,
want_M = FALSE,
m_chunk_max = 1000
)
Arguments
X |
Genotype matrix, |
subpops |
The length- |
n |
Number of individuals (required only when |
loci_on_cols |
If |
mean_of_ratios |
Chose how to weigh loci.
If |
mem_factor |
Proportion of available memory to use loading and processing data.
Ignored if |
mem_lim |
Memory limit in GB, used to break up data into chunks for very large datasets.
Note memory usage is somewhat underestimated and is not controlled strictly.
Default in Linux is |
want_M |
If |
m_chunk_max |
Sets the maximum number of loci to process at the time. Actual number of loci loaded may be lower if memory is limiting. |
Details
The subpopulation assignments are only used to estimate the baseline kinship (the zero value).
If the user wants to re-estimate the kinship matrix using different subpopulation labels,
it suffices to rescale it using rescale_popkin()
(as opposed to starting from the genotypes again, which gives the same answer but more slowly).
Value
If want_M = FALSE
, returns the estimated n
-by-n
kinship matrix only.
If X
has names for the individuals, they will be copied to the rows and columns of this kinship matrix.
If want_M = TRUE
, a named list is returned, containing:
-
kinship
: the estimatedn
-by-n
kinship matrix -
M
: then
-by-n
matrix of non-missing pair counts (seewant_M
option).
See Also
popkin_af()
for coancestry estimation from allele frequency matrices.
Examples
# Construct toy data
X <- matrix(
c(0, 1, 2,
1, 0, 1,
1, 0, 2),
nrow = 3,
byrow = TRUE
) # genotype matrix
subpops <- c(1,1,2) # subpopulation assignments for individuals
# NOTE: for BED-formatted input, use BEDMatrix!
# "file" is path to BED file (excluding .bed extension)
## library(BEDMatrix)
## X <- BEDMatrix(file) # load genotype matrix object
kinship <- popkin(X, subpops) # calculate kinship from genotypes and subpopulation labels