rdirmn {MGLM}R Documentation

The Dirichlet Multinomial Distribution

Description

ddirmn computes the log of the Dirichlet multinomial probability mass function. rdirmn generates Dirichlet multinomially distributed random number vectors.

Usage

rdirmn(n, size, alpha)

ddirmn(Y, alpha)

Arguments

n

number of random vectors to generate. When size is a scalar and alpha is a vector, must specify n. When size is a vector and alpha is a matrix, n is optional. The default value of n is the length of size. If given, n should be equal to the length of size.

size

a number or vector specifying the total number of objects that are put into d categories in the Dirichlet multinomial distribution.

alpha

the parameter of the Dirichlet multinomial distribution. Can be a numerical positive vector or matrix. For ddirmn, alpha has to match the size of Y. If alpha is a vector, it will be replicated n times to match the dimension of Y.

For rdirmn, if alpha is a vector, size must be a scalar, and all the random vectors will be drawn from the same alpha and size. If alpha is a matrix, the number of rows should match the length of size, and each random vector will be drawn from the corresponding row of alpha and the corresponding element in the size vector. See Details below.

Y

The multivariate count matrix with dimensions n \times d, where n = 1,2, \ldots is the number of observations and d=2,3, \ldots is the number of categories.

Details

When the multivariate count data exhibits over-dispersion, the traditional multinomial model is insufficient. Dirichlet multinomial distribution models the probabilities of the categories by a Dirichlet distribution. Given the parameter vector \alpha = (\alpha_1, \ldots, \alpha_d), \alpha_j>0 , the probability mass of d-category count vector Y=(y_1, \ldots, y_d), d \ge 2 under Dirichlet multinomial distribution is

P(y|\alpha) = C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^{d} \frac{\Gamma(\alpha_j+y_j)}{\Gamma(\alpha_j)} \frac{\Gamma(\sum_{j'=1}^d \alpha_{j'})}{\Gamma(\sum_{j'=1}^d \alpha_{j'} + \sum_{j'=1}^d y_{j'})},

where m=\sum_{j=1}^d y_j. Here, C_k^n, often read as "n choose k", refers the number of k combinations from a set of n elements.

The parameter \alpha can be a vector of length d, such as the results from the distribution fitting. \alpha can also be a matrix with n rows, such as the inverse link calculated from the regression parameter estimate exp(X\beta).

Value

For each count vector and each corresponding parameter vector \alpha, the function ddirmn returns the value \log(P(y|\alpha)). When Y is a matrix of n rows, ddirmn returns a vector of length n.

rdirmn returns a n\times d matrix of the generated random observations.

Examples

m <- 20
alpha <- c(0.1, 0.2)
dm.Y <- rdirmn(n=10, m, alpha)	
pdfln <- ddirmn(dm.Y, alpha)

[Package MGLM version 0.2.1 Index]