bkmodel {poisson.glm.mix}R Documentation

EM algorithm for the \beta_{k} (m=3) Poisson GLM mixture.

Description

This function applies EM algorithm for estimating a K-component mixture of Poisson GLM's, using parameterization m=3, that is the \beta_{k} model. Initialization can be done using two different intialization schemes. The first one is a one-step small EM procedure. The second one is a random splitting small EM procedure based on results of a mixture with less components. Output of the function is the updates of the parameters at each iteration of the EM algorithm, the estimate of \gamma, the estimated clusters and conditional probabilities of the observations, as well as the values of the BIC, ICL and loglikelihood of the model.

Usage

bkmodel(reference, response, L, m, K, nr, maxnr, t2, m2, 
        prev.z, prev.clust, start.type, prev.alpha, prev.beta)

Arguments

reference

a numeric array of dimension n\times V containing the V covariates for each of the n observations.

response

a numeric array of count data with dimension n\times d containing the d response variables for each of the n observations.

L

numeric vector of positive integers containing the partition of the d response variables into J\leq d blocks, with \sum_{j=1}^{J}L_j=d.

m

positive integer denoting the maximum number of EM iterations.

K

positive integer denoting the number of mixture components.

nr

negative number denoting the tolerance for the convergence of the Newton Raphson iterations.

maxnr

positive integer denoting the maximum number of Newton Raphson iterations.

t2

positive integer denoting the number of different runs of the small EM used by Initialization 1 (init1.k).

m2

positive integer denoting the number of iterations for each call of the small EM iterations used by Initialization 1 (init1.k).

prev.z

numeric array of dimension n\times(K-1) containing the estimates of the posterior probabilities according to the previous run of EM. This is used when Initialization 2 is adopted.

prev.clust

numeric vector of length n containing the estimated clusters according to the MAP rule obtained by the previous run of EM. This is used when Initialization 2 is adopted.

start.type

binary variable (1 or 2) indicating the type of initialization (1 for initialization 1 and 2 for initialization 2).

prev.alpha

numeric array of dimension J\times (K-1) containing the matrix of the ML estimates of the regression constants \alpha_{jk}, j=1,\ldots,J, k=1,\ldots,K-1, based on the previous run of EM algorithm. This is used in case of Initialization 2.

prev.beta

numeric array of dimension (K-1)\times T containing the matrix of the ML estimates of the regression coefficients \beta_{k\tau}, k=1,\ldots,K-1, \tau=1,\ldots,T, based on the previous run of EM algorithm. This is used in case of Initialization 2.

Value

alpha

numeric array of dimension t_{EM}\times J \times K containing the updates of regression constants \alpha_{jk}^{(t)}), j=1,\ldots,J, k=1,\ldots,K, for each iteration t=1,2,\ldots,t_{EM} of the EM algorithm.

beta

numeric array of dimension t_{EM}\times K \times T containing the updates of regression coefficients \beta_{k\tau}^{(t)}), k=1,\ldots,K, \tau=1,\ldots,T, for each iteration t=1,2,\ldots,t_{EM} of the EM algorithm.

gamma

numeric array of dimension J \times \max(L) containing the MLE of \gamma_{j\ell}, j=1,\ldots,J, \ell=1,\ldots,L_j.

psim

numeric array of dimension t_{EM}\times K containing the updates of mixture weights \pi_{k}^{(t)}), k=1,\ldots,K, for each iteration t=1,2,\ldots,t_{EM} of the EM algorithm.

clust

numeric vector of length n containing the estimated cluster for each observation according to the MAP rule.

z

numeric array of length n\times K containing the estimated conditional probabilities \tau_{ik}, i=1,\ldots,n, k=,\ldots,K, according to the last iteration of the EM algorithm.

bic

numeric, the value of the BIC.

icl

numeric, the value of the ICL.

ll

numeric, the value of the loglikelihood, computed according to the mylogLikePoisMix function.

Author(s)

Panagiotis Papastamoulis

See Also

init1.k, init2.k

Examples

############################################################
#1.            Example with Initialization 1               #
############################################################


## load a simulated dataset according to the b_jk model
## number of observations: 500
## design: L=(3,2,1)
data("simulated_data_15_components_bjk")
x <- sim.data[,1]
x <- array(x,dim=c(length(x),1))
y <- sim.data[,-1]
## use Initialization 1 with 2 components
## the number of different small runs equals t2=5, 
##	each one consisting of m1 = 5 iterations
## the maximum number of EM iterations is set to m = 1000.
nc <- 2
run <- bkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), 
               maxnr=10, t2=5, m2=5, prev.z, prev.clust, start.type=1, 
               prev.alpha, prev.beta)
## retrieve the iteration that the small em converged:
tem <- length(run$psim)/nc
## print the estimate of regression constants alpha.
run$alpha[tem,,]
## print the estimate of regression coefficients beta.
beta <- run$beta[tem,,]
## print the estimate of gamma.
run$gamma
## print the estimate of mixture weights.
run$psim[tem,]
## frequency table of the resulting clustering of the 
##		500 observations among the 2 components.
table(run$clust)
## print the value of the ICL criterion
run$icl
## print the value of the BIC
run$bic
## print the value of the loglikelihood
run$ll


############################################################
#2.            Example with Initialization 2               #
############################################################

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Given the estimates of Example 1, estimate a 3-component mixture using   ~
# Initialization 2. The number of different runs is set to $t2=2$ with     ~
# each one of them using $m2=5$ em iterations.                             ~
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
run.previous<-run
## number of conditions
q <- 3
## number of covariates
tau <- 1
## number of components
nc <- 3
## estimated conditional probabilities for K=10
z <- run.previous$z
## number of iteration that the previous EM converged
ml <- length(run.previous$psim)/(nc - 1) 	
## estimates of alpha when K=2
alpha <- array(run.previous$alpha[ml, , ], dim = c(q, nc - 1)) 
## estimates of beta when K=2
beta <- array(run.previous$beta[ml, , ], dim = c(nc - 1, tau))
clust <- run.previous$clust ##(estimated clusters when K=2)


run <- bkmodel(reference=x, response=y, L=c(3,2,1), m=1000, K=nc, nr=-10*log(10), 
               maxnr=10, t2=2, m2=5, prev.z=z, prev.clust=clust, start.type=2, 
               prev.alpha=alpha, prev.beta=beta)

# retrieve the iteration that EM converged 
tem <- length(run$psim)/nc
# estimates of the mixture weights
run$psim[tem,]
# estimates of the regression constants alpha_{jk}, j = 1,2,3, k=1,..,11
run$alpha[tem,,]
# estimates of the regression coefficients beta_{k\tau}, k = 1,..,11, \tau=1
run$beta[tem,,]

# note: useR should specify larger values for Kmax, m1, m2, t1, t2 
#	for a complete analysis.




[Package poisson.glm.mix version 1.4 Index]