dist {MGLM}R Documentation

Details of the distributions

Description

An object that specifies the distribution to be fitted by the MGLMfit function, or the regression model to be fitted by the MGLMreg or MGLMsparsereg functions. Can be chosen from "MN", "DM", "NegMN", or "GDM".

Details

"MN": Multinomial distribution

A multinomial distribution models the counts of dd possible outcomes. The counts of categories are negatively correlated. The density of a dd category count vector yy with parameter p=(p1,,pd)p=(p_1, \ldots, p_d) is

P(yp)=Cy1,,ydmj=1dpjyj, P(y|p) = C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^{d} p_j^{y_j},

where m=j=1dyjm = \sum_{j=1}^d y_j, 0<pj<10 < p_j < 1, and j=1dpj=1\sum_{j=1}^d p_j = 1. Here, CknC_k^n, often read as "nn choose kk", refers the number of kk combinations from a set of nn elements.

The MGLMreg function with dist="MN" calculates the MLE of regression coefficients βj\beta_j of the multinomial logit model, which has link function pj=exp(Xβj)/(1+j=1d1exp(Xβj))p_j = exp(X\beta_j)/(1 + \sum_{j=1}^{d-1} exp(X\beta_j)), j=1,,d1j=1,\ldots,d-1. The MGLMsparsereg function with dist="MN" fits regularized multinomial logit model.

"DM": Dirichlet multinomial distribution

When the multivariate count data exhibits over-dispersion, the traditional multinomial model is insufficient. Dirichlet multinomial distribution models the probabilities of the categories by a Dirichlet distribution. The density of a dd category count vector yy, with parameter α=(α1,,αd)\alpha = (\alpha_1, \ldots, \alpha_d), αj>0\alpha_j > 0, is

P(yα)=Cy1,,ydmj=1dΓ(αj+yj)Γ(αj)Γ(j=1dαj)Γ(j=1dαj+j=1dyj), P(y|\alpha) = C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^{d} \frac{\Gamma(\alpha_j+y_j)}{\Gamma(\alpha_j)} \frac{\Gamma(\sum_{j'=1}^d \alpha_{j'})}{\Gamma(\sum_{j'=1}^d \alpha_{j'} + \sum_{j'=1}^d y_{j'})},

where m=j=1dyjm=\sum_{j=1}^d y_j. Here, CknC_k^n, often read as "nn choose kk", refers the number of kk combinations from a set of nn elements.

The MGLMfit function with dist="DM" calculates the maximum likelihood estimate (MLE) of (α1,,αd)(\alpha_1, \ldots, \alpha_d). The MGLMreg function with dist="DM" calculates the MLE of regression coefficients βj\beta_j of the Dirichlet multinomial regression model, which has link function αj=exp(Xβj)\alpha_j = exp(X\beta_j), j=1,,dj=1,\ldots,d. The MGLMsparsereg function with dist="DM" fits regularized Dirichlet multinomial regression model.

"GDM": Generalized Dirichlet multinomial distribution

The more flexible Generalized Dirichlet multinomial model can be used when the counts of categories have both positive and negative correlations. The probability mass of a count vector yy over mm trials with parameter (α,β)=(α1,,αd1,β1,,βd1)(\alpha, \beta)=(\alpha_1, \ldots, \alpha_{d-1}, \beta_1, \ldots, \beta_{d-1}), αj,βj>0\alpha_j, \beta_j > 0, is

P(yα,β)=Cy1,,ydmj=1d1Γ(αj+yj)Γ(αj)Γ(βj+zj+1)Γ(βj)Γ(αj+βj)Γ(αj+βj+zj), P(y|\alpha,\beta) =C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^{d-1} \frac{\Gamma(\alpha_j+y_j)}{\Gamma(\alpha_j)} \frac{\Gamma(\beta_j+z_{j+1})}{\Gamma(\beta_j)} \frac{\Gamma(\alpha_j+\beta_j)}{\Gamma(\alpha_j+\beta_j+z_j)} ,

where zj=k=jdykz_j = \sum_{k=j}^d y_k and m=j=1dyjm=\sum_{j=1}^d y_j. Here, CknC_k^n, often read as "nn choose kk", #' refers the number of kk combinations from a set of nn elements.

The MGLMfit with dist="GDM" calculates the MLE of (α,β)=(α1,,αd1,β1,,βd1)(\alpha, \beta)=(\alpha_1, \ldots, \alpha_{d-1}, \beta_1, \ldots, \beta_{d-1}). The MGLMreg function with dist="GDM" calculates the MLE of regression coefficients αj,βj\alpha_j, \beta_j of the generalized Dirichlet multinomial regression model, which has link functions αj=exp(Xαj)\alpha_j=exp(X\alpha_j) and βj=exp(Xβj)\beta_j=exp(X\beta_j), j=1,,d1j=1, \ldots, d-1. The MGLMsparsereg function with dist="GDM" fits regularized generalized Dirichlet multinomial regression model.

"NegMN": Negative multinomial distribution

Both the multinomial distribution and Dirichlet multinomial distribution are good for negatively correlated counts. When the counts of categories are positively correlated, the negative multinomial distribution is preferred. The probability mass function of a dd category count vector yy with parameter (p1,,pd+1,β)(p_1, \ldots, p_{d+1}, \beta), j=1d+1pj=1\sum_{j=1}^{d+1} p_j=1, pj>0p_j > 0, β>0\beta > 0, is

P(yp,β)=Cmβ+m1Cy1,,ydmj=1dpjyjpd+1β=βmm!Cy1,,ydmj=1dpjyjpd+1β, P(y|p,\beta) = C_{m}^{\beta+m-1} C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^d p_j^{y_j} p_{d+1}^\beta \\ = \frac{\beta_m}{m!} C_{y_1, \ldots, y_d}^{m} \prod_{j=1}^d p_j^{y_j} p_{d+1}^\beta,

where m=j=1dyjm = \sum_{j=1}^d y_j. Here, CknC_k^n, often read as "nn choose kk", refers the number of kk combinations from a set of nn elements.

The MGLMfit function with dist="NegMN" calculates the MLE of (p1,,pd+1,β)(p_1, \ldots, p_{d+1}, \beta). The MGLMreg function with dist="NegMN" and regBeta=FALSE calculates the MLE of regression coefficients (α1,,αd,β)(\alpha_1,\ldots,\alpha_d, \beta) of the negative multinomial regression model, which has link function pd+1=1/(1+j=1dexp(Xαj))p_{d+1} = 1/(1 + \sum_{j=1}^d exp(X\alpha_j)), pj=exp(Xαj)pd+1p_j = exp(X\alpha_j) p_{d+1}, j=1,,dj=1, \ldots, d. When dist="NegMN" and regBeta=TRUE, the overdispersion parameter is linked to covariates via β=exp(Xαd+1)\beta=exp(X\alpha_{d+1}), and the function MGLMreg outputs an estimated matrix of (α1,,αd+1)(\alpha_1, \ldots, \alpha_{d+1}). The MGLMsparsereg function with dist="NegMN" fits regularized negative multinomial regression model.

Author(s)

Yiwen Zhang and Hua Zhou

See Also

MGLMfit, MGLMreg, MGLMsparsereg, dmn, ddirmn, dgdirmn, dnegmn


[Package MGLM version 0.2.1 Index]