main_loop_st {mixture}R Documentation

STPCM Internal C++ Call

Description

This function is the internal C++ function call within the stpcm function. This is a raw C++ function call, meaning it has no checks for proper inputs so it may fail to run without giving proper errors. Please ensure all arguements are valid. main_loop_st is useful for writing parallizations of the stpcm function. All arguement descriptions are given in terms of their corresponding C++ types.

Usage

main_loop_st(X, G, model_id, 
        model_type, in_zigs, 
        in_nmax, in_l_tol, in_m_iter_max,
        in_m_tol, anneals,
        latent_step="standard", 
        t_burn = 5L) 

Arguments

X

A matrix or data frame such that rows correspond to observations and columns correspond to variables. Note that this function currently only works with multivariate data p > 1.

G

A single positive integer value representing number of groups.

model_id

An integer representing the model_id, is useful for keeping track within parallizations. Not to be confused with model_type.

model_type

The type of covariance model you wish to run. Lexicon is given as follows: "0" = "EII", "1" = "VII", "2" = "EEI" , "3" = "EVI", "4" = "VEI", "5" = "VVI", "6" = "EEE", "7" = "VEE", "8" = "EVE", "9" = "EEV", "10" = "VVE", "11" = "EVV", "12" = "VEV", "13" = "VVV"

in_zigs

A n times G a posteriori matrix resembling the probability of observation i belonging to group G. Rows must sum to one, have the proper dimensions, and be positive.

in_nmax

Positive integer value resembling the maximum amount of iterations for the EM.

in_l_tol

A likelihood tolerance for convergence.

in_m_iter_max

For certain models, where applicable, the number of iterations for the maximization step.

in_m_tol

For certain models, where applicable, the tolerance for the maximization step.

anneals

A vector of doubles representing the deterministic annealing settings.

t_burn

A positive integer representing the number of burn steps if missing data (NAs) are detected.

latent_step

If "standard", it will use the standard E step for latent variable of a Normal Variance Mean Mixture, if "random" it will run a random draw from a GIG distribution.

Details

Be extremly careful running this function, it is known to crash systems without proper exception handling. Consider using the package parallel to estimate all possible models at the same time. Or run several possible initializations with random seeds.

Value

zigs

a postereori matrix

G

An integer representing the number of groups.

sigs

A vector of covariance matrices for each group (note you may have to reshape this)

mus

A vector of locational vectors for each group

alphas

A vector of skewness vectors for each group

vgs

Gamma parameters for each group

Author(s)

Nik Pocuca, Ryan P. Browne and Paul D. McNicholas.

Maintainer: Paul D. McNicholas <mcnicholas@math.mcmaster.ca>

References

McNicholas, P.D. (2016), Mixture Model-Based Classification. Boca Raton: Chapman & Hall/CRC Press

Browne, R.P. and McNicholas, P.D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification 8(2), 217-226.

Wei, Y., Tang, Y. and McNicholas, P.D. (2019), 'Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data', Computational Statistics and Data Analysis 130, 18-41.

Zhou, H. and Lange, K. (2010). On the bumpy road to the dominant mode. Scandinavian Journal of Statistics 37, 612-631.

Celeux, G., Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition 28(5), 781-793.

Examples

## Not run: 

data("sx2")
data_in = as.matrix(sx2,ncol = 2)
n_iter = 300

in_g = 2
n = dim(data_in)[1]
model_string <- "VEI"
in_model_type <- switch(model_string, "EII" = 0,"VII" = 1,  
              "EEI" = 2,  "EVI" = 3,  "VEI" = 4,  "VVI" = 5,  "EEE" = 6,  
              "VEE" = 7,  "EVE" = 8,  "EEV" = 9,  "VVE" = 10,
              "EVV" = 11,"VEV" = 12,"VVV" = 13)

zigs_in <- z_ig_random_soft(n,in_g)

m2 = main_loop_st(X = t(data_in), # data in has to be in column major form
               G = 2, # number of groups
               model_id = 1, # model id for parallelization later
               model_type = in_model_type,
               in_zigs = zigs_in, # initializaiton
               in_nmax = n_iter, # number of iterations
               in_l_tol = 0.5, # likilihood tolerance
               in_m_iter_max = 20, # maximium iterations for matrices
               anneals=c(1),
               in_m_tol = 1e-8) 

plot(sx2,col = MAP(m2$zigs) + 1, cex = 0.5, pch = 20)

## End(Not run)

[Package mixture version 2.1.1 Index]