poisson.glm.mix {poisson.glm.mix} | R Documentation |
Estimation of high dimensional Poisson GLMs via EM algorithm.
Description
This package can be used to cluster high dimensional count data under the presence of covariates. A mixture of Poisson Generalized Linear models (GLM's) is proposed. Conditionally to the covariates, Poisson multivariate distribution describing each cluster is a product of independent Poisson distributions. Different parameterizations for the slopes are proposed. Case of partioning the response variables into a set of replicates is considered. Poisson GLM mixture is estimated via Expectation Maximization (EM) algorithm with Newton-Raphson steps. An efficient initialization of EM algorithm is proposed to improve parameter estimation. It is a splitting scheme which is combined with a Small EM strategy. The user is referred to the function pois.glm.mix
for an automatic evaluation of the proposed methodology.
Details
Package: | poisson.glm.mix |
Type: | Package |
Version: | 1.4 |
Date: | 2023-08-19 |
Assume that the observed data can be written as where
,
,
, with
and
,
. Index
denotes the observation, while the vector
defines a partition of the
variables into
blocks: the first block consists of the first
variables, the second block consists of the next
variables and so on. We will refer to
and
using the terms “condition” and “replicate”, respectively. In addition to
, consider that a vector of
covariates is observed, denoted by
, for all
. Assume now that conditional to
, a model indicator
taking values in the discrete set
and a positive integer
, the response
, is a realization of the corresponding random vector
where denotes the Poisson distribution. The following parameterizations for the Poisson means
are considered: If
(the “
” parameterization), then
If (the “
” parameterization), then
If (the “
” parameterization), then
For identifiability purposes assume that ,
.
Author(s)
Papastamoulis Panagiotis Maintainer: Papastamoulis Panagiotis <papapast@yahoo.gr>
References
Papastamoulis, P., Martin-Magniette, M. L., & Maugis-Rabusseau, C. (2016). On the estimation of mixtures of Poisson regression models with large number of components. Computational Statistics & Data Analysis, 93, 97-106.
Examples
## load a small dataset of 500 observations
data("simulated_data_15_components_bjk")
## in this example there is V = 1 covariates (x)
## and d = 6 response variables (y). The design is
## L = (3,2,1).
V <- 1
x <- array(sim.data[,1],dim=c(dim(sim.data)[1],V))
y <- sim.data[,-1]
## We will run the algorithm using parameterization
## m = 1 and the number of components in the set
## {2,3,4}.
rr<-pois.glm.mix(reference=x, response=y, L=c(3,2,1), m=1,
max.iter=1000, Kmin=2, Kmax= 4,
m1=3, m2=3, t1=3, t2=3, msplit=4, tsplit=3,mnr = 5)
# note: useR should specify larger values for Kmax, m1, m2, t1,
# t2, msplit and tsplit for a complete analysis.
# retrieve the selected models according to BIC or ICL
rr$sel.mod.icl
rr$sel.mod.bic
# retrieve the estimates according to ICL
# alpha
rr$est.sel.mod.icl$alpha
# beta
rr$est.sel.mod.icl$beta
# gamma
rr$est.sel.mod.icl$gamma
# pi
rr$est.sel.mod.icl$pi
# frequency table with estimated clusters
table(rr$est.sel.mod.icl$clust)
# histogram of the maximum conditional probabilities
hist(apply(rr$est.sel.mod.icl$tau,1,max),30)
##(the full data of 5000 observations can be loaded using
## data("simulated_data_15_components_bjk_full")