poisson.glm.mix {poisson.glm.mix} | R Documentation |
Estimation of high dimensional Poisson GLMs via EM algorithm.
Description
This package can be used to cluster high dimensional count data under the presence of covariates. A mixture of Poisson Generalized Linear models (GLM's) is proposed. Conditionally to the covariates, Poisson multivariate distribution describing each cluster is a product of independent Poisson distributions. Different parameterizations for the slopes are proposed. Case of partioning the response variables into a set of replicates is considered. Poisson GLM mixture is estimated via Expectation Maximization (EM) algorithm with Newton-Raphson steps. An efficient initialization of EM algorithm is proposed to improve parameter estimation. It is a splitting scheme which is combined with a Small EM strategy. The user is referred to the function pois.glm.mix
for an automatic evaluation of the proposed methodology.
Details
Package: | poisson.glm.mix |
Type: | Package |
Version: | 1.4 |
Date: | 2023-08-19 |
Assume that the observed data can be written as y = (y_{1},\ldots,y_{n})
where y_i=\{y_{ij\ell};j = 1, \ldots,J,\ell = 1, \ldots,L_{j}\}
, y_i\in Z_+^{d}
, i = 1,\ldots,n
, with d = \sum_{j=1}^{J}L_{j}
and L_j \geq 1
, j=1,\ldots,J
. Index i
denotes the observation, while the vector L=(L_1,\ldots,L_J)
defines a partition of the d
variables into J
blocks: the first block consists of the first L_1
variables, the second block consists of the next L_2
variables and so on. We will refer to j
and \ell
using the terms “condition” and “replicate”, respectively. In addition to y
, consider that a vector of V
covariates is observed, denoted by x_{i} := \{x_{iv};v=1,\ldots,V\}
, for all i = 1, \ldots,n
. Assume now that conditional to x_{i}
, a model indicator m
taking values in the discrete set \{1,2,3\}
and a positive integer K
, the response y_{i}
, is a realization of the corresponding random vector
Y_{i}|x_{i}, m\sim \sum_{k = 1}^{K}\pi_{k}\prod_{j=1}^{J}\prod_{\ell=1}^{L_{j}}\mathcal P(\mu_{ij\ell k;m})
where \mathcal P
denotes the Poisson distribution. The following parameterizations for the Poisson means \mu_{ij\ell k;m}
are considered: If m=1
(the “\beta_{jk}
” parameterization), then
\mu_{ij\ell k;m}:=\alpha_{jk}+\gamma_{j\ell}+\sum_{v=1}^{V}\beta_{jkv}x_i.
If m=2
(the “\beta_{j}
” parameterization), then
\mu_{ij\ell k;m}:=\alpha_{jk}+\gamma_{j\ell}+\sum_{v=1}^{V}\beta_{jv}x_i.
If m=3
(the “\beta_{k}
” parameterization), then
\mu_{ij\ell k;m}:=\alpha_{jk}+\gamma_{j\ell}+\sum_{v=1}^{V}\beta_{kv}x_i.
For identifiability purposes assume that \sum_{\ell=1}^{L_j}\gamma_{j\ell}=0
, j=1,\ldots,J
.
Author(s)
Papastamoulis Panagiotis Maintainer: Papastamoulis Panagiotis <papapast@yahoo.gr>
References
Papastamoulis, P., Martin-Magniette, M. L., & Maugis-Rabusseau, C. (2016). On the estimation of mixtures of Poisson regression models with large number of components. Computational Statistics & Data Analysis, 93, 97-106.
Examples
## load a small dataset of 500 observations
data("simulated_data_15_components_bjk")
## in this example there is V = 1 covariates (x)
## and d = 6 response variables (y). The design is
## L = (3,2,1).
V <- 1
x <- array(sim.data[,1],dim=c(dim(sim.data)[1],V))
y <- sim.data[,-1]
## We will run the algorithm using parameterization
## m = 1 and the number of components in the set
## {2,3,4}.
rr<-pois.glm.mix(reference=x, response=y, L=c(3,2,1), m=1,
max.iter=1000, Kmin=2, Kmax= 4,
m1=3, m2=3, t1=3, t2=3, msplit=4, tsplit=3,mnr = 5)
# note: useR should specify larger values for Kmax, m1, m2, t1,
# t2, msplit and tsplit for a complete analysis.
# retrieve the selected models according to BIC or ICL
rr$sel.mod.icl
rr$sel.mod.bic
# retrieve the estimates according to ICL
# alpha
rr$est.sel.mod.icl$alpha
# beta
rr$est.sel.mod.icl$beta
# gamma
rr$est.sel.mod.icl$gamma
# pi
rr$est.sel.mod.icl$pi
# frequency table with estimated clusters
table(rr$est.sel.mod.icl$clust)
# histogram of the maximum conditional probabilities
hist(apply(rr$est.sel.mod.icl$tau,1,max),30)
##(the full data of 5000 observations can be loaded using
## data("simulated_data_15_components_bjk_full")