fit.FMMSNC {CensMFM} R Documentation

## Fitting Finite Mixture of Multivariate Distributions.

### Description

It adjusts a finite mixture of censored and/or missing multivariate distributions (FM-MC). These are the Skew-normal, normal and Student-t multivariate distributions. It uses a EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters.

### Usage

```fit.FMMSNC(cc, LI, LS, y, mu = NULL, Sigma = NULL, shape = NULL, pii = NULL,
nu = NULL, g = NULL, get.init = TRUE, criteria = TRUE, family = "SN", error = 1e-05,
iter.max = 350, uni.Gama = FALSE, kmeans.param = NULL, cal.im = FALSE)
```

### Arguments

 `cc` vector of censoring indicators. For each observation it takes 0 if non-censored, 1 if censored. `LI` the matrix of lower limits of dimension nxp. See details section. `LS` the matrix of upper limits of dimension nxp. See details section. `y` the response matrix with dimension nxp. `mu` a list with g entries, where each entry represents location parameter per group, being a vector of dimension. p. `Sigma` a list with g entries, where each entry represents a scale parameter per group, a matrix with dimension. pxp. `shape` a list with g entries, where each entry represents a skewness parameter, being a vector of dimension p. `pii` a vector of weights for the mixture (dimension of the number g of clusters). Must sum to one! `nu` the degrees of freedom for the Student-t distribution case, being a vector with dimension g. `g` number of mixture components. `get.init` Logical, `TRUE` or `FALSE`. If (`get.init==TRUE`) the function computes the initial values, otherwise (`get.init==FALSE`) the user should enter the initial values manually. `criteria` Logical, `TRUE` or `FALSE`. It indicates if likelihood-based criteria selection methods (AIC, BIC and EDC) are computed for comparison purposes. `family` distribution family to be used. Available distributions are the Skew-normal ("SN"), normal ("Normal") or Student-t ("t") distribution. `error` relative error for stopping criterion of the algorithm. See details. `iter.max` the maximum number of iterations of the EM algorithm. `uni.Gama` Logical, `TRUE` or `FALSE`. If `uni.Gama==TRUE`, the scale matrices per group are considered to be equals. `kmeans.param` a list with alternative parameters for the kmeans function when generating initial values. List by default is `list(iter.max = 10, n.start = 1, algorithm = "Hartigan-Wong")`. `cal.im` Logical, `TRUE` or `FALSE`. If `cal.im==TRUE`, the information matrix is calculated and the standard errors are reported.

### Details

The information matrix is calculated with respect to the entries of the square root matrix of Sigma, this using the Empirical information matrix. Disclaimer: User must be careful since the inference is asymptotic, so it must be used for decent sample sizes. Stopping criterion is `abs((loglik/loglik-1))<epsilon`.

### Value

It returns a list that depending of the case, it returns one or more of the following objects:

 `mu` a list with g components, where each component is a vector with dimension p containing the estimated values of the location parameter. `Sigma` a list with g components, where each component is a matrix with dimension pxp containing the estimated values of the scale matrix. `Gamma` a list with g components, where each component is a matrix with dimension pxp containing the estimated values of the Gamma scale matrix. `shape` a list with g components, where each component is a vector with dimension p containing the estimated values of the skewness parameter. `nu` a vector with one element containing the value of the degreees of freedom nu parameter. `pii` a vector with g elements containing the estimated values of the weights pii. `Zij` a n x p matrix containing the estimated weights values of the subjects for each group. `yest` a n x p matrix containing the estimated values of y. `MI` a list with the standard errors for all parameters. `logLik` the log-likelihood value for the estimated parameters. `aic` the AIC criterion value for the estimated parameters. `bic` the BIC criterion value for the estimated parameters. `edc` the EDC criterion value for the estimated parameters. `iter` number of iterations until the EM algorithm converges. `group` a n x p matrix containing the classification for the subjects to each group. `time` time in minutes until the EM algorithm converges.

### Note

The `uni.Gama` parameter refers to the Γ matrix for the Skew-normal distribution, while for the normal and student-t distribution, this parameter refers to the Σ matrix.

### Author(s)

Francisco H. C. de Alencar hildemardealencar@gmail.com, Christian E. Galarza cgalarza88@gmail.com, Victor Hugo Lachos hlachos@uconn.edu and Larissa A. Matos larissam@ime.unicamp.br

Maintainer: Francisco H. C. de Alencar hildemardealencar@gmail.com

### References

Cabral, C. R. B., Lachos, V. H., & Prates, M. O. (2012). Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis, 56(1), 126-142.

Prates, M. O., Lachos, V. H., & Cabral, C. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software, 54(12), 1-20.

C.E. Galarza, L.A. Matos, D.K. Dey & V.H. Lachos. (2019) On Moments of Folded and Truncated Multivariate Extended Skew-Normal Distributions. Technical report. ID 19-14. University of Connecticut.

F.H.C. de Alencar, C.E. Galarza, L.A. Matos & V.H. Lachos. (2019) Finite Mixture Modeling of Censored and Missing Data Using the Multivariate Skew-Normal Distribution. echnical report. ID 19-31. University of Connecticut.

`rMSN`, `rMMSN` and `rMMSN.contour`

### Examples

```mu          <- Sigma <- shape <- list()
mu[]     <- c(-3,-4)
mu[]     <- c(2,2)
Sigma[]  <- matrix(c(3,1,1,4.5), 2,2)
Sigma[]  <- matrix(c(2,1,1,3.5), 2,2)
shape[]  <- c(-2,2)
shape[]  <- c(-3,4)
nu          <- c(0,0)
pii         <- c(0.6,0.4)
percen <- c(0.1,0.2)
n <- 200
g <- 2
seed <- 654678

set.seed(seed)
test = rMMSN(n = n, pii = pii,mu = mu,Sigma = Sigma,shape = shape,
percen = percen, each = TRUE, family = "SN")

Zij <- test\$G
cc <- test\$cc
y <- test\$y

## left censoring ##
LI <-cc
LS <-cc
LI[cc==1]<- -Inf
LS[cc==1]<- y[cc==1]

#full analysis may take a few seconds more...

test_fit.cc0 = fit.FMMSNC(cc, LI, LS, y, mu=mu,
Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE,
criteria = TRUE, family = "Normal", error = 0.0001,
iter.max = 200, uni.Gama = FALSE, cal.im = FALSE)

test_fit.cc = fit.FMMSNC(cc, LI, LS, y, mu=mu,
Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE,
criteria = TRUE, family = "SN", error = 0.00001,
iter.max = 350, uni.Gama = FALSE, cal.im = TRUE)

## missing data ##
pctmiss <- 0.2 # 20% of missing data in the whole data
missing <- matrix(runif(n*g), nrow = n) < pctmiss
y[missing] <- NA

cc <- matrix(nrow = n,ncol = g)
cc[missing] <- 1
cc[!missing] <- 0

LI <- cc
LS <-cc
LI[cc==1]<- -Inf
LS[cc==1]<- +Inf

test_fit.mis = fit.FMMSNC(cc, LI, LS, y, mu=mu,
Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE,
criteria = TRUE, family = "SN", error = 0.00001,
iter.max = 350, uni.Gama = FALSE, cal.im = TRUE)

```

[Package CensMFM version 2.11 Index]