ldfast {ldsep} | R Documentation |
Fast bias-correction for LD Estimation
Description
Estimates the reliability ratios from posterior marginal moments and uses these to correct the biases in linkage disequilibrium estimation caused by genotype uncertainty. These methods are described in Gerard (2021).
Usage
ldfast(
gp,
type = c("r", "r2", "z", "D", "Dprime"),
shrinkrr = TRUE,
se = TRUE,
thresh = TRUE,
upper = 10,
mode = c("zero", "estimate"),
win = NULL
)
Arguments
gp |
A three-way array with dimensions SNPs by individuals by dosage.
That is, |
type |
What LD measure should we estimate?
Note that these are all composite measures of LD (see
the description in |
shrinkrr |
A logical. Should we use adaptive shrinkage
(Stephens, 2016) to shrink the reliability ratios ( |
se |
Should we also return a matrix of standard errors ( |
thresh |
A logical. Should we apply an upper bound on the reliability
ratios ( |
upper |
The upper bound on the reliability ratios if
|
mode |
A character. Only applies if |
win |
A positive integer. The window size. This will constrain the correlations calculated to those +/- the window size. This will only improve speed if the window size is much less than the number of SNPs. |
Value
A list with some or all of the following elements:
ldmat
The bias-corrected LD matrix.
rr
The estimated reliability ratio for each SNP. This is the multiplicative factor applied to the naive LD estimate for each SNP.
rr_raw
The raw reliability ratios (for the covariance, not the correlation). Only returned if
shrinkrr = TRUE
.rr_se
The standard errors for the log-raw reliability ratios for each SNP. That is, we have sd(log(rr_raw)) ~ rr_se. Only returned if
shrinkrr = TRUE
.semat
A matrix of standard errors of the corresponding estimators of LD.
Details
Returns consistent and bias-corrected estimates of linkage disequilibrium.
The usual measures of LD are implemented: D, D', r, r2, and z
(Fisher-z of r). These are all composite measures of LD, not
haplotypic measures of LD (see the description in ldest()
).
They are always appropriate measures of association
between loci, but only correspond to haplotypic measures of LD when
Hardy-Weinberg equilibrium is fulfilled in autopolyploids.
In order for these estimates to perform well, you need to use
posterior genotype probabilities that have been calculated using
adaptive priors, i.e. empirical/hierarchical Bayes approaches. There
are many approaches that do this, such as
updog
,
polyRAD
,
fitPoly
, or
SuperMASSA
.
Note that GATK uses a uniform prior, so would be inappropriate for
use in ldfast()
.
Calculating standard errors and performing hierarchical shrinkage of the
reliability ratios are both rather slow operations compared to just
raw method-of-moments based estimation for LD. If you don't need
standard errors, you can double your speed by setting
se = FALSE
. It is not recommended that you disable the
hierarchical shrinkage.
Due to sampling variability, the estimates sometime lie outside of the
theoretical boundaries of the parameters being estimated. In such cases,
we truncate the estimates at the boundary and return NA
for the
standard errors.
Mathematical formulation
Let
r
be the sample correlation of posterior mean genotypes between loci 1 and 2,a1
be the sample variance of posterior means at locus 1,a2
be the sample variance of posterior means at locus 2,b1
be the sample mean of posterior variances at locus 1, andb2
be the sample mean of posterior variances at locus 2.
Then the estimated Pearson correlation between the genotypes at loci 1 and 2 is
\sqrt{(a1 + b1)/a1}\sqrt{(a2 + b2)/a2}r.
All other LD calculations are based on this equation. In particular,
the estimated genotype variances at loci 1 and 2 are
a1 + b1
and a2 + b2
, respectively, which can be
used to calculate D and D'.
The reliability ratio for SNP i is defined by (ai + bi)/ai
.
By default, we apply ash()
(Stephens, 2016)
to the log of these reliability ratios before adjusting the
Pearson correlation. Standard errors are required before using
ash()
, but these are easily obtained
using the central limit theorem and the delta-method.
Author(s)
David Gerard
References
Gerard, David. Scalable Bias-corrected Linkage Disequilibrium Estimation Under Genotype Uncertainty. Heredity, 127(4), 357–362, 2021. doi:10.1038/s41437-021-00462-5.
T. Robertson and J. D. Cryer. An iterative procedure for estimating the mode. Journal of the American Statistical Association, 69(348):1012–1016, 1974. doi:10.1080/01621459.1974.10480246.
M. Stephens. False discovery rates: a new deal. Biostatistics, 18(2):275–294, 10 2016. doi:10.1093/biostatistics/kxw041.
See Also
ash()
Function used to perform hierarchical shrinkage on the log of the reliability ratios.
ldest()
,mldest()
,sldest()
Maximum likelihood estimation of linkage disequilibrium.
Examples
data("gp")
ldout <- ldfast(gp, "r")
ldout$ldmat
ldout$rr
ldout$semat
ldout <- ldfast(gp, "D")
ldout$ldmat
ldout$rr
ldout$semat
ldout <- ldfast(gp, "Dprime")
ldout$ldmat
ldout$rr
ldout$semat