pedigree_loglikelihood {clipp} | R Documentation |
Calculate the log-likelihoods of pedigrees
Description
For one or more pedigrees, this function calculates the natural logarithm of the pedigree likelihood that is on page 117 of (Lange, 2002), given inputs that correspond to the terms in this formula.
Usage
pedigree_loglikelihood(
dat,
geno_freq,
trans,
penet,
monozyg = NULL,
sum_loglik = TRUE,
ncores = 1,
load_balancing = TRUE
)
Arguments
dat |
A data frame with rows corresponding to people and columns
corresponding to the following variables (other variables can be included
but will be ignored), which will be coerced to
|
geno_freq |
A vector of strictly positive numbers that sum to |
trans |
An |
penet |
An |
monozyg |
An optional list that can be used to specify genetically
identical persons, such as monozygotic twins, monozygotic triplets,
a monozygotic pair within a set of dizygotic triplets, etc.
Each element of the list should be a vector containing the individual
identifiers of a group of genetically identical persons, e.g. if |
sum_loglik |
A logical flag. Return a named vector giving the
log-likelihood of each family if |
ncores |
The number of cores to be used, with |
load_balancing |
A logical flag. When |
Details
This function provides a fast and general implementation of the Elston-Stewart algorithm to calculate the log-likelihoods of potentially large and complex pedigrees. General references for the Elston-Stewart algorithm are (Elston & Stewart, 1971), (Lange & Elston, 1975) and (Cannings et al., 1978).
Each family within dat
should be a complete pedigree, meaning that each
person should either have both parental identifiers missing (if a founder)
or both non-missing (if a non-founder), and each (non-missing) mother or
father should have a corresponding row of dat
.
Observed genotypes should be incorporated into penet
, as described above.
The function can handle pedigree loops, such as those caused by inbreeding or by two sisters having children with two brothers from an unrelated family (see (Totir et al., 2009) for a precise definition), though pedigrees with more than a few loops could greatly reduce the speed of the calculation.
In geno_freq
, trans
and penet
, the order of the possible genotypes
must match, in the sense that the genotype that corresponds to element j
of geno_freq
must also correspond to column j
of trans
and penet
,
for each j
in 1:length(geno_freq)
.
Sex-specific genetics, such as X-linked genes or genetic loci with sex-specific
recombination fractions, can be modelled by letting genotypes 1:nm
be
the possible male genotypes and letting (nm+1):(nm+nf)
be the possible
female genotypes, where nm
and nf
are the number of possible genotypes
for males and females, respectively. Then, for example, penet[i,j]
will
be 0
if j %in% 1:nm
and row i
of dat
corresponds to a female, and
penet[i,j]
will be 0
if j %in% (nm+1):(nm+nf)
and row i
of
dat
corresponds to a male.
Value
Either a named vector giving the log-likelihood of each family
or the sum of these log-likelihoods, depending on sum_loglik
(see above).
References
Cannings C, Thompson E, Skolnick M. Probability functions on complex pedigrees. Advances in Applied Probability, 1978;10(1):26-61.
Elston RC, Stewart J. A general model for the genetic analysis of pedigree data. Hum Hered. 1971;21(6):523-542.
Lange K. Mathematical and Statistical Methods for Genetic Analysis (second edition). Springer, New York. 2002.
Lange K, Elston RC. Extensions to pedigree analysis I. Likehood calculations for simple and complex pedigrees. Hum Hered. 1975;25(2):95-105.
Totir LR, Fernando RL, Abraham J. An efficient algorithm to compute marginal posterior genotype probabilities for every member of a pedigree with loops. Genet Sel Evol. 2009;41(1):52.
Examples
# Load pedigree files and penetrance matrices
data("dat_small", "penet_small", "dat_large", "penet_large")
# Settings for a single biallelic locus in Hardy-Weinberg equilibrium
# and with a minor allele frequency of 10%
geno_freq <- geno_freq_monogenic(c(0.9, 0.1))
trans <- trans_monogenic(2)
# In dat_small, ora024 and ora027 are identical twins, and so are aey063 and aey064
monozyg_small <- list(c("ora024", "ora027"), c("aey063", "aey064"))
# Calculate the log-likelihoods for 10 families, each with approximately
# 100 family members
pedigree_loglikelihood(
dat_small, geno_freq, trans, penet_small, monozyg_small, sum_loglik = FALSE, ncores = 2
)
# Calculate the log-likelihood for one family with approximately 10,000 family members
# Note: this calculation should take less than a minute on a standard desktop computer
# Note: parallelization would achieve nothing here because there is only one family
str(dat_large)
system.time(
ll <- pedigree_loglikelihood(dat_large, geno_freq, trans, penet_large)
)
ll