dirmult {dirmult} | R Documentation |
Parameter estimation in Dirichlet-multinomial distribution
Description
Consider allele frequencies from different
subpopulations. The allele counts, X
, (or equivalently
allele frequencies) are expected to vary between subpopulation. This
variability are sometimes refered to as identity-by-decent, but may be
modelled as overdispersion due to intra-class correlation
\theta
. The allele counts within each subpopulation is
assumed to follow a multinomial distribution conditioned on the allele
probabilities, \pi_1,\dots,\pi_{k-1}
. When
\pi
follows a Dirichlet distribution the marginal
distribution of X
is Dirichlet-multinomial with parameters
\pi
and \theta
with density:
%
P(X=x) = {n \choose x}
\frac{\prod_{j=1}^k\prod_{r=1}^{x_j}\{\pi_j(1-\theta) + (r-1)\theta\}}%
{\prod_{r=1}^{n}\{1-\theta + (r-1)\theta\}}.
Using an alternative parameterization the density may be written as:
%
P(X=x) =
{n \choose x}
\frac{\Gamma(\gamma_+)}{\Gamma(n+\gamma_+)}
\prod_{j=1}^k \frac{\Gamma\left(x_j + \gamma_j\right)}%
{\Gamma\left(\gamma_j\right)},
where \gamma_+=(1-\theta)/\theta
and
\gamma_j=\pi_j\theta
.
This formulation second parameterization is used in the iterations
since it converges much faster than the original parameterization.
The function dirmult
estimates the parameters
\gamma
in the Dirichlet-multinomial distribution and
transform these into
\pi_1,\dots,\pi_{k-1}
and
\theta
.
Usage
dirmult(data, init, initscalar, epsilon=10^(-4), trace=TRUE, mode)
Arguments
data |
A matrix or table with counts. Rows represent subpopulations and columns the different categories of the data. Zero rows or columns are automaticly removed. |
init |
Initial values for the |
initscalar |
Initial value for
|
epsilon |
Convergence tolerance. On termination the difference between to succeeding log-likelihoods must be smaller than epsilon. |
trace |
Logical. If TRUE the parameter estimates and log-likelihood value is printed to the screen after each iteration, otherwise no out-put is produces while iterating. |
mode |
Takes values "obs" (default) or "exp" determining whether the observed or expected FIM should be used in the Fisher Scoring. All other arguments produces an error message, but the observed FIM is used in the iterations. |
Value
Returns a list containing:
loglik |
The final log-likelihood value. |
ite |
Number of iterations used. |
gamma |
A vector of |
pi |
A vector of |
theta |
Estimated |
See Also
Examples
data(us)
fit <- dirmult(us[[1]],epsilon=10^(-4),trace=FALSE)
dirmult.summary(us[[1]],fit)