R: Mixtures of Multiple Scaled Contaminated Normal...

mscn {MSclust}

R Documentation

Mixtures of Multiple Scaled Contaminated Normal Distributions.

Description

Fits a mixture of multiple scaled contaminated normal distributions to the given data.

Usage

mscn(X,k,ini="km",sz=NULL,al=c(0.5,0.99),eta.min=1.01,m="BFGS",stop=c(10^-5,200),VB=FALSE)

Arguments

`X`	A matrix or data frame such that rows correspond to observations and columns correspond to variables.
`k`	The number of clusters.
`ini`	Using kmeans by default or `"pam"` for partition around medoids, `"mclust"` for Gaussian mixture models, `"random.soft"` or `"random.hard"` for random or manual; if `"manual"`, a partition (`sz`) must be provided.
`sz`	If initialization is `"manual"`, this matrix contains the starting values for `z`.
`al`	2-dimensional vector containing minimum and maximum proportion of good points in each group for the contaminated normal distribution.
`eta.min`	Minimum value for inflation parameter for the covariance matrix for the bad points.
`m`	Method for the optimization of the eigenvector matrix, see `optim` for other options.
`stop`	2-dimensional vector with the Aitken criterion stopping rule and maximum number of iterations.
`VB`	If `TRUE`, tracing information on the progress of the optimization is produced; see `optim` for details and plotting of the log-likelihood versus iterations.

Value

`X`	Data used for clustering.
`n`	The number of observations in the data.
`d`	The number of features in the data.
`k`	Value corresponding to the number of components.
`cluster`	Vector of group membership as determined by the model.
`detect`	Detect if the point is bad or not per each principal component given the cluster membership.
`npar`	The number of parameters.
`mu`	Either a vector of length `d`, representing the mean value, or (except for `rmscn`) a matrix whose rows represent different mean vectors; if it is a matrix, its dimensions must match those of `x`.
`Lambda`	Orthogonal matrix whose columns are the normalized eigenvectors of Sigma.
`Gamma`	Diagonal matrix of the eigenvalues of Sigma.
`Sigma`	A symmetric positive-definite matrix representing the scale matrix of the distribution.
`alpha`	Proportion of good observations.
`eta`	Degree of contamination.
`z`	The component membership of each observations.
`v`	The indicator if an observation is good or bad with respect to each dimension; 1 is good, and 0 means bad.
`weight`	The matrix of the expected value of the characteristic weights; corespond to the value of `v+(1-v)/eta`.
`iter.stop`	The number of iterations until convergence for the model.
`loglik`	The log-likelihood corresponding to the model.
`AIC`	The Akaike's Information Criterion of the model.
`BIC`	The Bayesian Information Criterion of the model.
`ICL`	The Integrated Completed Likelihood of the model.
`KIC`	The Kullback Information Criterion of the model.
`KICc`	The Bias correction of the Kullback Information Criterion of the model.
`AWE`	The Approximate Weight of Evidence of the model.
`AIC3`	Another version of Akaike's Information Criterion of the model.
`CAIC`	The Consistent Akaike's Information Criterion of the model.
`AICc`	The AIC version which is used when sample size `n` is small relative to `d`.
`CLC`	The Classification Likelihood Criterion of the model.

Author(s)

Cristina Tortora and Antonio Punzo

References

Punzo, A. & Tortora, C. (2021). Multiple scaled contaminated normal distribution and its application in clustering. Statistical Modelling, 21(4): 332–358.

Examples

## Not run:
## Not run: 
data(sim)
result <- mscn(X = sim, k = 2)
plot(result)
summary(result)
## End(Not run)
## End(Not run)

[Package MSclust version 1.0.4 Index]