mult.em_1level {mult.latent.reg}R Documentation

EM algorithm for multivariate one level model with covariates

Description

This function is used to obtain the Maximum Likelihood Estimates (MLE) using the EM algorithm for one-level multivariate data. The estimates enable users to conduct clustering, ranking, and simultaneous dimension reduction on the multivariate dataset. Furthermore, when covariates are included, the function supports the fitting of multivariate response models, expanding its utility for regression analysis. The details of the model used in this function can be found in Zhang and Einbeck (2024).

Arguments

data

A data set object; we denote the dimension to be m.

v

Covariate(s).

K

Number of mixture components, the default is K = 2. Note that when K = 1, z and beta will be 0.

steps

Number of iterations, the default is steps = 20.

start

Containing parameters involved in the proposed model (p, alpha, z, beta, sigma, gamma) in a list, the starting values can be obtained through the use of start_em. More details can be found in start_em.

option

Four options for selecting the starting values for the parameters in the model. The default is option = 1. More details can be found in start_em.

var_fun

There are four types of variance specifications; var_fun = 1, the same diagonal variance specification to all K components of the mixture; var_fun = 2, different diagonal variance matrices for different components. var_fun = 3, the same full (unrestricted) variance for all components. var_fun = 4, different full (unrestricted) variance matrices for different components. The default is var_fun = 2.

Value

The estimated parameters in the model x_{i} = \alpha + \beta z_k + \Gamma v_i + \varepsilon_i obtained through the EM algorithm at the convergence.

p

The estimates for the parameter \pi_k, which is a vector of length K.

alpha

The estimates for the parameter \alpha, which is a vector of length m.

z

The estimates for the parameter z_k, which is a vector of length K.

beta

The estimates for the parameter \beta, which is a vector of length m.

gamma

The estimates for the parameter \Gamma, which is a matrix.

sigma

The estimates for the parameter \Sigma_k. When var_fun = 1, \Sigma_k is a diagonal matrix and \Sigma_k = \Sigma, and we obtain a vector of the diagonal elements; When var_fun = 2, \Sigma_k is a diagonal matrix, and we obtain K vectors of the diagonal elements; When var_fun = 3, \Sigma_k is a full variance-covariance matrix, \Sigma_k = \Sigma, and we obtain a matrix \Sigma; When var_fun = 4, \Sigma_k is a full variance-covariance matrix, and we obtain K different matrices \Sigma_k.

W

The posterior probability matrix.

loglikelihood

The approximated log-likelihood of the fitted model.

disparity

The disparity (-2logL) of the fitted model.

number_parameters

The number of parameters estimated in the EM algorithm.

AIC

The AIC value (-2logL + 2number_parameters).

BIC

The BIC value (-2logL + number_parameters*log(n)), where n is the number of observations.

starting_values

A list of starting values for parameters used in the EM algorithm.

References

Zhang, Y. and Einbeck, J. (2024). A Versatile Model for Clustered and Highly Correlated Multivariate Data. J Stat Theory Pract 18(5).doi:10.1007/s42519-023-00357-0

See Also

mult.reg_1level.

Examples

##example for data without covariates.
data(faithful)
res <- mult.em_1level(faithful,K=2,steps = 10,var_fun = 1)


## Graph showing the estimated one-dimensional space with cluster centers in red and alpha in green.
x <- res$alpha[1]+res$beta[1]*res$z
y <- res$alpha[2]+res$beta[2]*res$z
plot(faithful,col = 8)
points(x=x[1],y=y[1],type = "p",col = "red",pch = 17)
points(x=x[2],y=y[2],type = "p",col = "red",pch = 17)
points(x=res$alpha[1],y=res$alpha[2],type = "p",col = "darkgreen",pch = 4)
slope <- (y[2]-y[1])/(x[2]-x[1])
intercept <- y[1]-slope*x[1]
abline(intercept, slope, col="red")

##Graph showing the originaldata points being assigned to different
 ##clusters according to the Maximum a posterior (MAP) rule.
index <- apply(res$W, 1, which.max)
faithful_grouped <- cbind(faithful,index)
colors <- c("#FDAE61", "#66BD63")
plot(faithful_grouped[,-3], pch = 1, col = colors[factor(index)])


##example for data with covariates.
data(fetal_covid_data)
set.seed(2)
covid_res <- mult.em_1level(fetal_covid_data[,c(1:5)],v=fetal_covid_data$status_bi, K=3, steps = 20,
             var_fun = 2)
coeffs <- covid_res$gamma
##compare with regression coefficients from fitting individual linear models.
summary(lm( UpperFaceMovements ~ status_bi,data=fetal_covid_data))$coefficients[2,1]
summary(lm( Headmovements ~ status_bi,data=fetal_covid_data))$coefficients[2,1]


[Package mult.latent.reg version 0.1.7 Index]