mscn {MSclust}R Documentation

Mixtures of Multiple Scaled Contaminated Normal Distributions.

Description

Fits a mixture of multiple scaled contaminated normal distributions to the given data.

Usage

mscn(X,k,ini="km",sz=NULL,al=c(0.5,0.99),eta.min=1.01,m="BFGS",stop=c(10^-5,200),VB=FALSE)

Arguments

X

A matrix or data frame such that rows correspond to observations and columns correspond to variables.

k

The number of clusters.

ini

Using kmeans by default or "pam" for partition around medoids, "mclust" for Gaussian mixture models, "random.soft" or "random.hard" for random or manual; if "manual", a partition (sz) must be provided.

sz

If initialization is "manual", this matrix contains the starting values for z.

al

2-dimensional vector containing minimum and maximum proportion of good points in each group for the contaminated normal distribution.

eta.min

Minimum value for inflation parameter for the covariance matrix for the bad points.

m

Method for the optimization of the eigenvector matrix, see optim for other options.

stop

2-dimensional vector with the Aitken criterion stopping rule and maximum number of iterations.

VB

If TRUE, tracing information on the progress of the optimization is produced; see optim for details and plotting of the log-likelihood versus iterations.

Value

X

Data used for clustering.

n

The number of observations in the data.

d

The number of features in the data.

k

Value corresponding to the number of components.

cluster

Vector of group membership as determined by the model.

detect

Detect if the point is bad or not per each principal component given the cluster membership.

npar

The number of parameters.

mu

Either a vector of length d, representing the mean value, or (except for rmscn) a matrix whose rows represent different mean vectors; if it is a matrix, its dimensions must match those of x.

Lambda

Orthogonal matrix whose columns are the normalized eigenvectors of Sigma.

Gamma

Diagonal matrix of the eigenvalues of Sigma.

Sigma

A symmetric positive-definite matrix representing the scale matrix of the distribution.

alpha

Proportion of good observations.

eta

Degree of contamination.

z

The component membership of each observations.

v

The indicator if an observation is good or bad with respect to each dimension; 1 is good, and 0 means bad.

weight

The matrix of the expected value of the characteristic weights; corespond to the value of v+(1-v)/eta.

iter.stop

The number of iterations until convergence for the model.

loglik

The log-likelihood corresponding to the model.

AIC

The Akaike's Information Criterion of the model.

BIC

The Bayesian Information Criterion of the model.

ICL

The Integrated Completed Likelihood of the model.

KIC

The Kullback Information Criterion of the model.

KICc

The Bias correction of the Kullback Information Criterion of the model.

AWE

The Approximate Weight of Evidence of the model.

AIC3

Another version of Akaike's Information Criterion of the model.

CAIC

The Consistent Akaike's Information Criterion of the model.

AICc

The AIC version which is used when sample size n is small relative to d.

CLC

The Classification Likelihood Criterion of the model.

Author(s)

Cristina Tortora and Antonio Punzo

References

Punzo, A. & Tortora, C. (2021). Multiple scaled contaminated normal distribution and its application in clustering. Statistical Modelling, 21(4): 332–358.

Examples

## Not run:
## Not run: 
data(sim)
result <- mscn(X = sim, k = 2)
plot(result)
summary(result)
## End(Not run)
## End(Not run) 

[Package MSclust version 1.0.4 Index]