dirmultinomial {VGAM} | R Documentation |
Fitting a Dirichlet-Multinomial Distribution
Description
Fits a Dirichlet-multinomial distribution to a matrix response.
Usage
dirmultinomial(lphi = "logitlink", iphi = 0.10, parallel = FALSE,
zero = "M")
Arguments
lphi |
Link function applied to the |
iphi |
Numeric. Initial value for |
parallel |
A logical (formula not allowed here) indicating whether the
probabilities |
zero |
An integer-valued vector specifying which
linear/additive predictors are modelled as intercepts only.
The values must be from the set |
Details
The Dirichlet-multinomial distribution arises from a multinomial distribution where the probability parameters are not constant but are generated from a multivariate distribution called the Dirichlet distribution. The Dirichlet-multinomial distribution has probability function
where is the over-dispersion parameter
and
. Here,
means “
choose
”
and refers to combinations (see
choose
).
The above formula applies to each row of the matrix response.
In this VGAM family function the first
linear/additive predictors correspond to the first
probabilities via
where is the
th linear/additive
predictor (
by definition for
but not for
)
and
.
The
th linear/additive predictor corresponds to
lphi
applied to .
Note that but
the probabilities (returned as the fitted values)
are bundled together as a
-column
matrix. The quantities
are returned as the prior
weights.
The beta-binomial distribution is a special case of
the Dirichlet-multinomial distribution when ;
see
betabinomial
. It is easy to show that
the first shape parameter of the beta distribution is
and the second shape parameter is
. Also,
, which
is known as the intra-cluster correlation coefficient.
Value
An object of class "vglmff"
(see
vglmff-class
). The object is used by modelling
functions such as vglm
, rrvglm
and vgam
.
If the model is an intercept-only model then @misc
(which is a
list) has a component called shape
which is a vector with the
values
.
Warning
This VGAM family function is prone to numerical problems, especially when there are covariates.
Note
The response can be a matrix of non-negative integers, or
else a matrix of sample proportions and the total number of
counts in each row specified using the weights
argument.
This dual input option is similar to multinomial
.
To fit a ‘parallel’ model with the
parameter being an intercept-only you will need to use the
constraints
argument.
Currently, Fisher scoring is implemented. To compute the
expected information matrix a for
loop is used; this
may be very slow when the counts are large. Additionally,
convergence may be slower than usual due to round-off error
when computing the expected information matrices.
Author(s)
Thomas W. Yee
References
Paul, S. R., Balasooriya, U. and Banerjee, T. (2005). Fisher information matrix of the Dirichlet-multinomial distribution. Biometrical Journal, 47, 230–236.
Tvedebrink, T. (2010).
Overdispersion in allelic counts and -correction in
forensic genetics.
Theoretical Population Biology, 78, 200–210.
Yu, P. and Shaw, C. A. (2014). An Efficient Algorithm for Accurate Computation of the Dirichlet-Multinomial Log-Likelihood Function. Bioinformatics, 30, 1547–54.
See Also
dirmul.old
,
betabinomial
,
betabinomialff
,
dirichlet
,
multinomial
.
Examples
nn <- 5; M <- 4; set.seed(1)
ydata <- data.frame(round(matrix(runif(nn * M, max = 100), nn, M)))
colnames(ydata) <- paste("y", 1:M, sep = "") # Integer counts
fit <- vglm(cbind(y1, y2, y3, y4) ~ 1, dirmultinomial,
data = ydata, trace = TRUE)
head(fitted(fit))
depvar(fit) # Sample proportions
weights(fit, type = "prior", matrix = FALSE) # Total counts per row
## Not run:
ydata <- transform(ydata, x2 = runif(nn))
fit <- vglm(cbind(y1, y2, y3, y4) ~ x2, dirmultinomial,
data = ydata, trace = TRUE)
Coef(fit)
coef(fit, matrix = TRUE)
(sfit <- summary(fit))
vcov(sfit)
## End(Not run)