fit.FMMSNC {CensMFM} R Documentation

## Fitting Finite Mixture of Multivariate Distributions.

### Description

It adjusts a finite mixture of censored and/or missing multivariate distributions (FM-MC). These are the Skew-normal, normal and Student-t multivariate distributions. It uses a EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters.

### Usage

fit.FMMSNC(cc, LI, LS, y, mu = NULL, Sigma = NULL, shape = NULL, pii = NULL,
nu = NULL, g = NULL, get.init = TRUE, criteria = TRUE, family = "SN", error = 1e-05,
iter.max = 350, uni.Gama = FALSE, kmeans.param = NULL, cal.im = FALSE)

### Arguments

 cc vector of censoring indicators. For each observation it takes 0 if non-censored, 1 if censored. LI the matrix of lower limits of dimension nxp. See details section. LS the matrix of upper limits of dimension nxp. See details section. y the response matrix with dimension nxp. mu a list with g entries, where each entry represents location parameter per group, being a vector of dimension. p. Sigma a list with g entries, where each entry represents a scale parameter per group, a matrix with dimension. pxp. shape a list with g entries, where each entry represents a skewness parameter, being a vector of dimension p. pii a vector of weights for the mixture (dimension of the number g of clusters). Must sum to one! nu the degrees of freedom for the Student-t distribution case, being a vector with dimension g. g number of mixture components. get.init Logical, TRUE or FALSE. If (get.init==TRUE) the function computes the initial values, otherwise (get.init==FALSE) the user should enter the initial values manually. criteria Logical, TRUE or FALSE. It indicates if likelihood-based criteria selection methods (AIC, BIC and EDC) are computed for comparison purposes. family distribution family to be used. Available distributions are the Skew-normal ("SN"), normal ("Normal") or Student-t ("t") distribution. error relative error for stopping criterion of the algorithm. See details. iter.max the maximum number of iterations of the EM algorithm. uni.Gama Logical, TRUE or FALSE. If uni.Gama==TRUE, the scale matrices per group are considered to be equals. kmeans.param a list with alternative parameters for the kmeans function when generating initial values. List by default is list(iter.max = 10, n.start = 1, algorithm = "Hartigan-Wong"). cal.im Logical, TRUE or FALSE. If cal.im==TRUE, the information matrix is calculated and the standard errors are reported.

### Details

The information matrix is calculated with respect to the entries of the square root matrix of Sigma, this using the Empirical information matrix. Disclaimer: User must be careful since the inference is asymptotic, so it must be used for decent sample sizes. Stopping criterion is abs((loglik/loglik-1))<epsilon.

### Value

It returns a list that depending of the case, it returns one or more of the following objects:

 mu a list with g components, where each component is a vector with dimension p containing the estimated values of the location parameter. Sigma a list with g components, where each component is a matrix with dimension pxp containing the estimated values of the scale matrix. Gamma a list with g components, where each component is a matrix with dimension pxp containing the estimated values of the Gamma scale matrix. shape a list with g components, where each component is a vector with dimension p containing the estimated values of the skewness parameter. nu a vector with one element containing the value of the degreees of freedom nu parameter. pii a vector with g elements containing the estimated values of the weights pii. Zij a n x p matrix containing the estimated weights values of the subjects for each group. yest a n x p matrix containing the estimated values of y. MI a list with the standard errors for all parameters. logLik the log-likelihood value for the estimated parameters. aic the AIC criterion value for the estimated parameters. bic the BIC criterion value for the estimated parameters. edc the EDC criterion value for the estimated parameters. iter number of iterations until the EM algorithm converges. group a n x p matrix containing the classification for the subjects to each group. time time in minutes until the EM algorithm converges.

### Note

The uni.Gama parameter refers to the Γ matrix for the Skew-normal distribution, while for the normal and student-t distribution, this parameter refers to the Σ matrix.

### Author(s)

Francisco H. C. de Alencar hildemardealencar@gmail.com, Christian E. Galarza cgalarza88@gmail.com, Victor Hugo Lachos hlachos@uconn.edu and Larissa A. Matos larissam@ime.unicamp.br

Maintainer: Francisco H. C. de Alencar hildemardealencar@gmail.com

### References

Cabral, C. R. B., Lachos, V. H., & Prates, M. O. (2012). Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis, 56(1), 126-142.

Prates, M. O., Lachos, V. H., & Cabral, C. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software, 54(12), 1-20.

C.E. Galarza, L.A. Matos, D.K. Dey & V.H. Lachos. (2019) On Moments of Folded and Truncated Multivariate Extended Skew-Normal Distributions. Technical report. ID 19-14. University of Connecticut.

F.H.C. de Alencar, C.E. Galarza, L.A. Matos & V.H. Lachos. (2019) Finite Mixture Modeling of Censored and Missing Data Using the Multivariate Skew-Normal Distribution. echnical report. ID 19-31. University of Connecticut.

### Examples

mu          <- Sigma <- shape <- list()
mu[[1]]     <- c(-3,-4)
mu[[2]]     <- c(2,2)
Sigma[[1]]  <- matrix(c(3,1,1,4.5), 2,2)
Sigma[[2]]  <- matrix(c(2,1,1,3.5), 2,2)
shape[[1]]  <- c(-2,2)
shape[[2]]  <- c(-3,4)
nu          <- c(0,0)
pii         <- c(0.6,0.4)
percen <- c(0.1,0.2)
n <- 200
g <- 2
seed <- 654678

set.seed(seed)
test = rMMSN(n = n, pii = pii,mu = mu,Sigma = Sigma,shape = shape,
percen = percen, each = TRUE, family = "SN")

Zij <- test\$G
cc <- test\$cc
y <- test\$y

## left censoring ##
LI <-cc
LS <-cc
LI[cc==1]<- -Inf
LS[cc==1]<- y[cc==1]

#full analysis may take a few seconds more...

test_fit.cc0 = fit.FMMSNC(cc, LI, LS, y, mu=mu,
Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE,
criteria = TRUE, family = "Normal", error = 0.0001,
iter.max = 200, uni.Gama = FALSE, cal.im = FALSE)

test_fit.cc = fit.FMMSNC(cc, LI, LS, y, mu=mu,
Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE,
criteria = TRUE, family = "SN", error = 0.00001,
iter.max = 350, uni.Gama = FALSE, cal.im = TRUE)

## missing data ##
pctmiss <- 0.2 # 20% of missing data in the whole data
missing <- matrix(runif(n*g), nrow = n) < pctmiss
y[missing] <- NA

cc <- matrix(nrow = n,ncol = g)
cc[missing] <- 1
cc[!missing] <- 0

LI <- cc
LS <-cc
LI[cc==1]<- -Inf
LS[cc==1]<- +Inf

test_fit.mis = fit.FMMSNC(cc, LI, LS, y, mu=mu,
Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE,
criteria = TRUE, family = "SN", error = 0.00001,
iter.max = 350, uni.Gama = FALSE, cal.im = TRUE)

