| vgpcm {mixture} | R Documentation | 
Variance Gamma Parsimonious Clustering Models
Description
Carries out model-based clustering or classification using some or all of the 14 parsimonious Variance Gamma clustering models (VGPCM).
Usage
vgpcm(data=NULL, G=1:3, mnames=NULL,
		start=2, label=NULL, 
		veo=FALSE, da=c(1.0),
		nmax=1000, atol=1e-8, mtol=1e-8, mmax=10, burn=5,
		pprogress=FALSE, pwarning=FALSE, 
		stochastic = FALSE, latent_method="standard", seed=123) 
Arguments
| data | A matrix or data frame such that rows correspond to observations and columns correspond to variables. Note that this function currently only works with multivariate data p > 1. | 
| G | A sequence of integers giving the number of components to be used. | 
| mnames | The models (i.e., covariance structures) to be used. If  | 
| start | If  | 
| label | If  | 
| veo | Stands for "Variables exceed observations". If  | 
| da | Stands for Determinstic Annealing. A vector of doubles. | 
| nmax | The maximum number of iterations each EM algorithm is allowed to use. | 
| atol | A number specifying the epsilon value for the convergence criteria used in the EM algorithms. For each algorithm, the criterion is based on the difference between the log-likelihood at an iteration and an asymptotic estimate of the log-likelihood at that iteration. This asymptotic estimate is based on the Aitken acceleration and details are given in the References. | 
| mtol | A number specifying the epsilon value for the convergence criteria used in the M-step in the EM algorithms. | 
| mmax | The maximum number of iterations each M-step is allowed in the GEM algorithms. | 
| burn | The burn in period for imputing data. (Missing observations are removed and a model is estimated seperately before placing an imputation step within the EM.) | 
| pprogress | If  | 
| pwarning | If  | 
| stochastic | If  | 
| latent_method | If  | 
| seed | The seed for the run, default is 123 | 
Details
The data x are either clustered or classified using Variance Gamma mixture models with some or all of the 14 parsimonious covariance structures described in Celeux & Govaert (1995). The algorithms given by Celeux & Govaert (1995) is used for 12 of the 14 models; the "EVE" and "VVE" models use the algorithms given in Browne & McNicholas (2014). Starting values are very important to the successful operation of these algorithms and so care must be taken in the interpretation of results. 
Value
An object of class vgpcm is a list with components:
| map | A vector of integers indicating the maximum a posteriori classifications for the best model. | 
| model_objs | A list of all estimated models with parameters returned from the C++ call. | 
| best_model | A class of vgpcm_best containing; the number of groups for the best model, the covariance structure, and Bayesian Information Criterion (BIC) value. | 
| loglik | The log-likelihood values from fitting the best model. | 
| z | A matrix giving the raw values upon which  | 
| BIC | A G by mnames by 3 dimensional array with values pertaining to BIC calculations. (legacy) | 
| startobject | The type of object inputted into  | 
| gpar | A list object for each cluster pertaining to parameters. (legacy) | 
| row_tags | If there were NAs in the original dataset, a vector of indices referencing the row of the imputed vectors is given. | 
Best Model
An object of class vgpcm_best is a list with components:
| model_type | A string containg summarized information about the type of model estimated (Covariance structure and number of groups). | 
| model_obj | An internal list containing all parameters returned from the C++ call. | 
| BIC | Bayesian Index Criterion (positive scale, bigger is better). | 
| loglik | Log liklihood from the estimated model. | 
| nparam | Number of a parameters in the mode. | 
| startobject | The type of object inputted into  | 
| G | An integer representing the number of groups. | 
| cov_type | A string representing the type of covariance matrix (see 14 models). | 
| status | Convergence status of EM algorithm according to Aitken's Acceleration | 
| map | A vector of integers indicating the maximum a posteriori classifications for the best model. | 
| row_tags | If there were NAs in the original dataset, a vector of indices referencing the row of the imputed vectors is given. | 
Internal Objects
All classes contain an internal list called model_obj or model_objs with the following components:
| zigs | a posteori matrix | 
| G | An integer representing the number of groups. | 
| sigs | A vector of covariance matrices for each group | 
| mus | A vector of location vectors for each group | 
| alphas | A vector containg skewness vectors for each group | 
| gammas | A vector containing estimated gamma parameters for each group | 
Note
Dedicated print, plot and summary functions are available for objects of class vgpcm.
Author(s)
Nik Pocuca, Ryan P. Browne and Paul D. McNicholas.
Maintainer: Paul D. McNicholas <mcnicholas@math.mcmaster.ca>
References
McNicholas, P.D. (2016), Mixture Model-Based Classification. Boca Raton: Chapman & Hall/CRC Press
Browne, R.P. and McNicholas, P.D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification 8(2), 217-226.
Celeux, G., Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition 28(5), 781-793.
Examples
## Not run: 
data("sx2")
### use kmeans to find starting values
ax0 = vgpcm(sx2, G=1:3, mnames=c("VVV", "EVE"),start=2, pprogress=TRUE, atol=1e-2)
summary(ax0)
ax0
### use random soft initializations. 
ax6 = vgpcm(sx2, G=1:3, mnames=c("VVV", "EVE"),start= 0)
summary(ax6)
ax6
### use deterministic annealing for starting values
axDA = vgpcm(sx2, G=1:3, mnames=c("VVV", "EVE"), start=0,da=c(0.3,0.5,0.8,1.0))
summary(axDA)
axDA
### estimate all 14 covariance structures 
ax = vgpcm(sx2, G=1:3, mnames=NULL, start=0)
summary(ax)
ax
### model based classification
sx2.label = c(rep(1,1000),rep(2,1000))
plot(sx2, col=sx2.label)
axl = vgpcm(sx2, G=2, mnames=c("VVV", "EVE"), label=sx2.label)
summary(axl)
## End(Not run)