R: Undifferentiate an allele distribution

undiff_af {bnpsd}

R Documentation

Undifferentiate an allele distribution

Description

This function takes a vector of allele frequencies and a mean kinship value, and returns a new distribution of allele frequencies that is consistent with reversing differentiation with the given kinship, in the sense that the new distribution is more concentrated around the middle (0.5) than the input/original by an amount predicted from theory. The new distribution is created by weighing the input distribution with a random mixing distribution with a lower variance. An automatic method is provided that always selects a Beta distribution with just the right concentration to work for the given data and kinship. Explicit methods are also provided for more control, but are more likely to result in errors when mixing variances are not small enough (see details below).

Usage

undiff_af(
  p,
  kinship_mean,
  distr = c("auto", "uniform", "beta", "point"),
  alpha = 1,
  eps = 10 * .Machine$double.eps
)

Arguments

`p`	A vector of observed allele frequencies.
`kinship_mean`	The mean kinship value of the differentiation to reverse.
`distr`	Name of the mixing distribution to use. "auto" picks a symmetric Beta distribution with parameters that ensure a small enough variance to succeed. "beta" is a symmetric Beta distribution with parameter `alpha` as provided below. "uniform" is a uniform distribution (same as "beta" with `alpha = 1`). "point" is a distribution fully concentrated/fixed at 0.5 (same as the limit of "beta" with `alpha = Inf`, which has zero variance).
`alpha`	Shape parameter for `distr = "beta"`, ignored otherwise.
`eps`	If `distr = "auto"`, this small value is added to the calculated `alpha` to avoid roundoff errors and ensuring that the mixing variance is smaller than the maximum allowed.

Details

Model: Suppose we started from an allele frequency p0 with expectation 0.5 and variance V0. Differentiation creates a new (sample) allele frequency p1 without changing its mean (E(p1|p0) = p0) and with a conditional variance given by the mean kinship phi: Var(p1|p0) = p0*(1-p0)*phi. The total variance of the new distribution (calculated using the law of total variance) equals V1 = Var(p1) = phi/4 + (1-phi)*V0. (Also E(p1) = 0.5). So the new variance is greater for phi>0 (note V0 <= 1/4 for any distribution bounded in (0,1)). Thus, given V1 and phi, the goal is to construct a new distribution with the original, lower variance of V0 = (V1-phi/4)/(1-phi). An error is thrown if V1 < phi/4 in input data, which is inconsistent with this assumed model.

Construction of "undifferentiated" allele frequencies: p_out = w*p_in + (1-w)*p_mix, where p_in is the input with sample variance V_in (V1 in above model) and p_mix is a random draw from the mixing distribution distr with expectation 0.5 and known variance V_mix. The output variance is V_out = w^2*V_in + (1-w)^2*V_mix, which we set to the desired V_out = (V_in-phi/4)/(1-phi) (V0 in above model) and solve for w (the largest of the two quadratic roots is used). An error is thrown if V_mix > V_out (the output variance must be larger than the mixing variance). This error is avoided by manually adjusting choice of distr and alpha (for distr = "beta"), or preferably with distr = "auto" (default), which selects a Beta distribution with alpha = (1/(4*V_out)-1)/2 + eps that is guaranteed to work for any valid V_out (assuming V_in < phi/4).

Value

A list with the new distribution and several other informative statistics, which are named elements:

p: A new vector of allele frequencies with the same length as input p, with the desired variance (see details) obtained by weighing the input p with new random data from distribution distr.
w: The weight used for the input data (1-w for the mixing distribution).
kinship_mean_max: The maximum mean kinship possible for undifferentiating this data (equals four times the input variance (see details), which results in zero output variance).
V_in: sample variance of input p, assuming its expectation is 0.5.
V_out: target variance of output p.
V_mix: variance of mixing distribution.
alpha: the value of alpha used for symmetric Beta mixing distribution, informative if distr = "auto".

Examples

# create random uniform data for these many loci
m <- 100
p <- runif( m )
# differentiate the distribution using Balding-Nichols model
F <- 0.1
nu <- 1 / F - 1
p2 <- rbeta( m, p * nu, (1 - p) * nu )

# now undifferentiate with this function
# NOTE in this particular case `F` is also the mean kinship
# default "automatic" distribution recommended
# (avoids possible errors for specific distributions)
p3 <- undiff_af( p2, F )$p

# note p3 does not equal p exactly (original is unrecoverable)
# but variances (assuming expectation is 0.5 for all) should be close to each other,
# and both be lower than p2's variance:
V1 <- mean( ( p - 0.5 )^2 )
V2 <- mean( ( p2 - 0.5 )^2 )
V3 <- mean( ( p3 - 0.5 )^2 )
# so p3 is stochastically consistent with p as far as the variance is concerned

[Package bnpsd version 1.3.13 Index]