mix_mnm_logistic {multinomialLogitMix}R Documentation

EM algorithm

Description

Estimation of the multinomial logit mixture using the EM algorithm. The algorithm exploits a careful initialization procedure (Papastamoulis et al., 2016) combined with a ridge-stabilized implementation of the Newton-Raphson method (Goldfeld et al., 1966) in the M-step.

Usage

mix_mnm_logistic(y, X, Kmax = 10, maxIter = 100, emthreshold = 1e-08, 
	maxNR = 5, nCores, tsplit = 8, msplit = 5, split = TRUE, 
	shake = TRUE, random = TRUE, criterion = "ICL", 
	plotting = FALSE, R0 = 0.1, method = 5)

Arguments

y

matrix of counts

X

design matrix (including constant term).

Kmax

Maximum number of mixture components.

maxIter

Maximum number of iterations.

emthreshold

Minimum loglikelihood difference between successive iterations in order to terminate.

maxNR

maximum number of Newton Raphson iterations

nCores

number of cores for parallel computations.

tsplit

positive integer denoting the number of different runs for each call of the splitting small EM used by split-small EM initialization procedure.

msplit

positive integer denoting the number of different runs for each call of the splitting small EM.

split

Boolean indicating if the split initialization should be enabled in the small-EM scheme.

shake

Boolean indicating if the shake initialization should be enabled in the small-EM scheme.

random

Boolean indicating if random initializations should be enabled in the small-EM scheme.

criterion

set to "ICL" to select the number of clusters according to the ICL criterion.

plotting

Boolean for displaying intermediate graphical output.

R0

controls the step size of the update: smaller values result to larger step sizes. See description in paper.

method

this should be set to 5.

Value

estimated_K

selected value of the number of clusters.

all_runs

detailed output per run.

BIC_values

values of bayesian information criterion.

ICL_BIC_values

values of ICL-BIC.

estimated_clustering

Single best-clustering of the data, according to the MAP rule.

Author(s)

Panagiotis Papastamoulis

References

Papastamoulis P (2022). Model-based clustering of multinomial count data. arXiv:2207.13984 [stat.ME]

Examples

#	Generate synthetic data

	K <- 2
	p <- 2
	D <- 3
	n <- 2
	set.seed(116)
	simData <- simulate_multinomial_data(K = K, p = p, D = D, n = n, size = 20, prob = 0.025)   

	
	SplitShakeSmallEM <- mix_mnm_logistic(y = simData$count_data, 
		X = simData$design_matrix, Kmax = 2, maxIter = 1, 
		emthreshold = 1e-8, maxNR = 1, nCores = 2, tsplit = 1, 
		msplit = 2, split = TRUE, R0 = 0.1, method = 5, 
		plotting = FALSE)
	#selected number of clusters
	SplitShakeSmallEM$estimated_K
	#estimated single best-clustering, according to MAP rule
	SplitShakeSmallEM$estimated_clustering
	# detailed output for all parameters of the selected number of clusters
	SplitShakeSmallEM$all_runs[[SplitShakeSmallEM$estimated_K]]
	

[Package multinomialLogitMix version 1.1 Index]