pois.glm.mix {poisson.glm.mix} | R Documentation |
Main call function of the package.
Description
This function is the main function of the package. User has only to call it by specifying the data (x
and y
), the vector L
, the parameterization (m\in \{1,2,3\}
), the desirable range for the number of components, the type of initialization and the number of EM runs and iterations for the small-EM strategy. When K=K_{min}
, EM algorithm is initialized according to Initialization scheme 1 (the functions init1.1.jk.j
, init1.2.jk.j
, init1.k
). For consecutive run (K>K_{min}
), EM algorithm is initialized using Initialization 2 (the functions init2.jk.j
or init2.k
).
Usage
pois.glm.mix(reference, response, L, m, max.iter, Kmin, Kmax,
m1, m2, t1, t2, msplit, tsplit,mnr)
Arguments
reference |
a numeric array of dimension |
response |
a numeric array of count data with dimension |
L |
numeric vector of positive integers containing the partition of the |
m |
variable denoting the parameterization of the model: 1 for |
max.iter |
positive integer denoting the maximum number of EM iterations. |
Kmin |
the minimum number of mixture components. |
Kmax |
the maximum number of mixture components. |
m1 |
positive integer denoting the number of iterations for each call of the 1st small EM iterations used by Initialization 1 ( |
m2 |
positive integer denoting the number of iterations for each call of the overall small EM iterations used by Initialization 1 ( |
t1 |
positive integer denoting the number of different runs of the 1st small EM used by Initialization 1 ( |
t2 |
positive integer denoting the number of different runs of the overall small EM used by Initialization 1 ( |
msplit |
positive integer denoting the number of different runs for each call of the splitting small EM used by Initialization 2 ( |
tsplit |
positive integer denoting the number of different runs for each call of the splitting small EM used by Initialization 2 ( |
mnr |
positive integer denoting the maximum number of Newton-Raphson iterations. |
Details
The output of the function is a list of lists. During the run of the function pois.glm.mix
two R
graphic devices are opened: The first one contains the graph of the information criteria (BIC and ICL). In the second graphe, the resulting fitted clusters per condition are plotted until the ICL criterion no longer selects a better model. Notice that in this graph the L_j
replicates of condition j=1,\ldots,J
are summed.
The EM algorithm is run until the increase to the loglikelihood of the mixture model is less than 10^{-6}
. The Newton - Raphson iterations at the Maximization step of EM algorithm are repeated until the square Euclidean norm of the gradient vector of the component specific parameters is less than 10^{-10}
.
Value
information.criteria |
numeric array of dimension |
runs |
A list containing the output for the estimated mixture for each |
sel.mod.icl |
The selected number of mixture components according to the ICL criterion. |
sel.mod.bic |
The selected number of mixture components according to the BIC. |
est.sel.mod.icl |
The final estimates for the selected number of mixture components according to the ICL criterion. It is a list containing |
est.sel.mod.bic |
The final estimates for the selected number of mixture components according to the BIC. It is a list containing |
Note
In case that an exhaustive search is desired instead of a random selection of the splitted components, use tsplit = -1
.
Author(s)
Panagiotis Papastamoulis
See Also
bjkmodel
, bjmodel
, bkmodel
, init1.1.jk.j
, init1.2.jk.j
, init1.k
, init2.jk.j
, init2.k
, mylogLikePoisMix
Examples
## load a small dataset of 500 observations
data("simulated_data_15_components_bjk")
## in this example there is V = 1 covariates (x)
## and d = 6 response variables (y). The design is
## L = (3,2,1).
V <- 1
x <- array(sim.data[,1],dim=c(dim(sim.data)[1],V))
y <- sim.data[,-1]
## We will run the algorithm using parameterization
## m = 1 and the number of components in the set
## {2,3,4}.
rr<-pois.glm.mix(reference=x, response=y, L=c(3,2,1), m=1,
max.iter=1000, Kmin=2, Kmax= 4,
m1=3, m2=3, t1=3, t2=3, msplit=3, tsplit=3,mnr = 5)
# note: useR should specify larger values for Kmax, m1, m2,
# t1, t2, msplit and tsplit for a complete analysis.
# retrieve the selected models according to BIC or ICL
rr$sel.mod.icl
rr$sel.mod.bic
# retrieve the estimates according to ICL
# alpha
rr$est.sel.mod.icl$alpha
# beta
rr$est.sel.mod.icl$beta
# gamma
rr$est.sel.mod.icl$gamma
# pi
rr$est.sel.mod.icl$pi
# frequency table with estimated clusters
table(rr$est.sel.mod.icl$clust)
# histogram of the maximum conditional probabilities
hist(apply(rr$est.sel.mod.icl$tau,1,max),30)
##(the full data of 5000 observations can be loaded using
## data("simulated_data_15_components_bjk_full")