undiff_af {bnpsd} R Documentation

## Undifferentiate an allele distribution

### Description

This function takes a vector of allele frequencies and a mean kinship value, and returns a new distribution of allele frequencies that is consistent with reversing differentiation with the given kinship, in the sense that the new distribution is more concentrated around the middle (0.5) than the input/original by an amount predicted from theory. The new distribution is created by weighing the input distribution with a random mixing distribution with a lower variance. An automatic method is provided that always selects a Beta distribution with just the right concentration to work for the given data and kinship. Explicit methods are also provided for more control, but are more likely to result in errors when mixing variances are not small enough (see details below).

### Usage

```undiff_af(
p,
kinship_mean,
distr = c("auto", "uniform", "beta", "point"),
alpha = 1,
eps = 10 * .Machine\$double.eps
)
```

### Arguments

 `p` A vector of observed allele frequencies. `kinship_mean` The mean kinship value of the differentiation to reverse. `distr` Name of the mixing distribution to use. "auto" picks a symmetric Beta distribution with parameters that ensure a small enough variance to succeed. "beta" is a symmetric Beta distribution with parameter `alpha` as provided below. "uniform" is a uniform distribution (same as "beta" with `alpha = 1`). "point" is a distribution fully concentrated/fixed at 0.5 (same as the limit of "beta" with `alpha = Inf`, which has zero variance). `alpha` Shape parameter for `distr = "beta"`, ignored otherwise. `eps` If `distr = "auto"`, this small value is added to the calculated `alpha` to avoid roundoff errors and ensuring that the mixing variance is smaller than the maximum allowed.

### Details

Model: Suppose we started from an allele frequency `p0` with expectation 0.5 and variance `V0`. Differentiation creates a new (sample) allele frequency `p1` without changing its mean (`E(p1|p0) = p0`) and with a conditional variance given by the mean kinship `phi`: `Var(p1|p0) = p0*(1-p0)*phi`. The total variance of the new distribution (calculated using the law of total variance) equals `V1 = Var(p1) = phi/4 + (1-phi)*V0`. (Also `E(p1) = 0.5`). So the new variance is greater for `phi>0` (note `V0 <= 1/4` for any distribution bounded in (0,1)). Thus, given `V1` and `phi`, the goal is to construct a new distribution with the original, lower variance of `V0 = (V1-phi/4)/(1-phi)`. An error is thrown if `V1 < phi/4` in input data, which is inconsistent with this assumed model.

Construction of "undifferentiated" allele frequencies: `p_out = w*p_in + (1-w)*p_mix`, where `p_in` is the input with sample variance `V_in` (`V1` in above model) and `p_mix` is a random draw from the mixing distribution `distr` with expectation 0.5 and known variance `V_mix`. The output variance is `V_out = w^2*V_in + (1-w)^2*V_mix`, which we set to the desired `V_out = (V_in-phi/4)/(1-phi)` (`V0` in above model) and solve for `w` (the largest of the two quadratic roots is used). An error is thrown if `V_mix > V_out` (the output variance must be larger than the mixing variance). This error is avoided by manually adjusting choice of `distr` and `alpha` (for `distr = "beta"`), or preferably with `distr = "auto"` (default), which selects a Beta distribution with `alpha = (1/(4*V_out)-1)/2 + eps` that is guaranteed to work for any valid `V_out` (assuming `V_in < phi/4`).

### Value

A list with the new distribution and several other informative statistics, which are named elements:

• `p`: A new vector of allele frequencies with the same length as input `p`, with the desired variance (see details) obtained by weighing the input `p` with new random data from distribution `distr`.

• `w`: The weight used for the input data (`1-w` for the mixing distribution).

• `kinship_mean_max`: The maximum mean kinship possible for undifferentiating this data (equals four times the input variance (see details), which results in zero output variance).

• `V_in`: sample variance of input `p`, assuming its expectation is 0.5.

• `V_out`: target variance of output `p`.

• `V_mix`: variance of mixing distribution.

• `alpha`: the value of `alpha` used for symmetric Beta mixing distribution, informative if `distr = "auto"`.

### Examples

```# create random uniform data for these many loci
m <- 100
p <- runif( m )
# differentiate the distribution using Balding-Nichols model
F <- 0.1
nu <- 1 / F - 1
p2 <- rbeta( m, p * nu, (1 - p) * nu )

# now undifferentiate with this function
# NOTE in this particular case `F` is also the mean kinship
# default "automatic" distribution recommended
# (avoids possible errors for specific distributions)
p3 <- undiff_af( p2, F )\$p

# note p3 does not equal p exactly (original is unrecoverable)
# but variances (assuming expectation is 0.5 for all) should be close to each other,
# and both be lower than p2's variance:
V1 <- mean( ( p - 0.5 )^2 )
V2 <- mean( ( p2 - 0.5 )^2 )
V3 <- mean( ( p3 - 0.5 )^2 )
# so p3 is stochastically consistent with p as far as the variance is concerned

```

[Package bnpsd version 1.3.13 Index]