R: Reduced-Rank Mixture Models in Multivariate Regression

rrmix {rrMixture}

R Documentation

Reduced-Rank Mixture Models in Multivariate Regression

Description

‘rrmix’ is used to estimate parameters of reduced-rank mixture models in multivariate linear regression using the full-ranked, rank-penalized, and adaptive nuclear norm penalized estimators proposed by Kang et. al. (2022+).

Usage

rrmix(K = 2, X, Y, est = c("FR", "RP", "ANNP"),
      lambda = 0, gamma = 2, ind0 = NULL, para0 = NULL, seed = NULL,
      kmscale = FALSE, km.nstart = 20, n.init = 100, commonvar = FALSE,
      maxiter = 1000, maxiter.int = 100, thres = 1e-05, thres.int = 1e-05,
      visible = FALSE, para.true = NULL, ind.true = NULL)

Arguments

`K`	number of mixture components.
`X`	n by p design matrix where n is the number of observations and p is the number of predictors.
`Y`	n by q response matrix where n is the number of observations and q is the number of responses.
`est`	character, specifying the estimation method. ‘FR’, ‘RP’, and ‘ANNP’ refers to as the full-ranked, rank-penalized, and adaptive nuclear norm penalized method, respectively.
`lambda`	numerical value, specifying tuning parameter. Only used in the estimation method of ‘RP’ and ‘ANNP’. If 0, all estimation methods (‘FR’, ‘RP’, and ‘ANNP’) provide the same estimation results.
`gamma`	numerical value, specifying additional tuning parameter, only used in the estimation method of ‘ANNP’. It must be nonnegative.
`ind0`	vector of length n, specifying the initial assignment of the mixture membership of n observations when there is prior information on the membership. If ‘NULL’, K-means clustering technique is used to assign the membership for n observations. Default is ‘NULL’.
`para0`	array of length K. It consists of K lists, each of which contains initial values of membership probability, coefficient matrix, and variance- covariance matrix.
`seed`	seed number for the reproducibility of initialization results in the EM algorithm. Default is ‘NULL’.
`kmscale`	logical value, indicating whether Y is scaled prior to K-means clustering for initialization. Default is ‘FALSE’.
`km.nstart`	number of random sets considered to perform K-means clustering for initialization. Default is 20.
`n.init`	number of initializations to try. Two methods for initial clustering are used: K-means and random clustering.
`commonvar`	logical value, indicating the homogeneity assumption of variance-covariance matrices across K mixture components. Default is ‘FALSE’.
`maxiter`	maximum number of iterations for external iterative algorithm, used in all estimation methods.
`maxiter.int`	maximum number of iterations for internal iterative algorithm, only used in the estimation method of ‘ANNP’.
`thres`	threshold value for external EM algorithm, used in all estimation methods. It controls the termination of the EM algorithm.
`thres.int`	threshold value for internal iterative algorithm, only used in the estimation method of ‘ANNP’. It controls the termination of the internal algorithm.
`visible`	logical value, indicating whether the outputs from each iteration are printed. Useful when the whole algorithm takes long. Default is ‘FALSE’.
`para.true`	array of length K. It consists of K lists, each of which contains a coefficient matrix and its true rank. Only used when true models are known, e.g., in a simulation study.
`ind.true`	vector of length n, specifying the true mixture membership for n observations. Only used when true models are known, e.g., in a simulation study.

Value

An object of class rrmix containing the fitted model, including:

`call`	original function call.
`seed`	seed number which is set for the initilization.
`n.est`	vector of length K, specifying the estimated number of observations in each mixture components.
`para`	array of length K. It consists of K lists, each of which contains final estimates of membership probability, coefficient matrix, and variance- covariance matrix.
`est.rank`	vector of length K, specifying the estimated ranks of coefficient matrices.
`npar`	number of parameters in the model, used to estimate the BIC.
`n.iter`	number of iterations (external EM algorithm).
`lambda`	tuning parameter for the estimation method of 'RP' or 'ANNP'.
`gamma`	tuning parameter for the estimation method of 'ANNP'.
`ind`	vector of length n, specifying the estimated mixture membership for n observations.
`ind.true`	vector of length n, specifying the true mixture membership for n observations. Only returned when the true models are known.
`loglik`	log-likelihood of the final model.
`penloglik`	penalized log-likelihood of the final model.
`penalty`	penalty in the penalized log-likelihood of the final model.
`bic`	BIC of the final model.
`avg.nn.iter`	average number of iterations for internal iterative algorithm, only returned for the estimation method of 'ANNP'.
`resmat`	matrix containing the information for each iteration of the EM algorithm, e.g., iteration number, log-likelihood, penalized log- likelihood, difference between penalized log-likelihood values from two consecutive iterations, and computing time.
`class.err`	Soft and hard classification errors for mixture membership. Only returned when the true models are known.
`est.err`	estimation error from the comparison between the estimated and true coefficient matrices. Only returned when the true models are known.
`pred.err`	prediction error. Only returned when the true models are known.

Author(s)

Suyeon Kang, University of California, Riverside, skang062@ucr.edu; Weixin Yao, University of California, Riverside, weixin.yao@ucr.edu; Kun Chen, University of Connecticut, kun.chen@uconn.edu.

References

Kang, S., Chen, K., and Yao, W. (2022+). "Reduced rank estimation in mixtures of multivariate linear regression".

Examples

library(rrMixture)

#-----------------------------------------------------------#
# Real Data Example: Tuna Data
#-----------------------------------------------------------#
require(bayesm)
data(tuna)
tunaY <- log(tuna[, c("MOVE1", "MOVE2", "MOVE3", "MOVE4",
                  "MOVE5", "MOVE6", "MOVE7")])
tunaX <- tuna[, c("NSALE1", "NSALE2", "NSALE3", "NSALE4",
              "NSALE5", "NSALE6", "NSALE7",
              "LPRICE1", "LPRICE2", "LPRICE3", "LPRICE4",
              "LPRICE5", "LPRICE6", "LPRICE7")]
tunaX <- cbind(intercept = 1, tunaX)

# Rank-penalized estimation

tuna.rp <- rrmix(K = 2, X = tunaX, Y = tunaY, lambda = 3, est = "RP",
           seed = 100, n.init = 100)
summary(tuna.rp)
plot(tuna.rp) 

# Adaptive nuclear norm penalized estimation

tuna.annp <- rrmix(K = 2, X = tunaX, Y = tunaY, lambda = 3, gamma = 2, est = "ANNP",
             seed = 100, n.init = 100)
summary(tuna.annp)
plot(tuna.annp)       

#-----------------------------------------------------------#
# Simulation: Two Components Case
#-----------------------------------------------------------#
# Simulation Data
K2mod <- rrmix.sim.norm(K = 2, n = 100, p = 5, q = 5, rho = .5,
         b = 1, shift = 1, r.star = c(1, 3), sigma = c(1, 1),
         pr = c(.5, .5), seed = 1215)
         
# Rank-penalized estimation

K2.rp <- rrmix(K = 2, X = K2mod$X, Y = K2mod$Y, lambda = 1,
         seed = 17, est = "RP", ind.true = K2mod$ind.true,
         para.true = K2mod$para.true, n.init = 100)
summary(K2.rp)
plot(K2.rp)
         
# Adaptive nuclear norm penalized estimation

K2.annp <- rrmix(K = 2, X = K2mod$X, Y = K2mod$Y, lambda = 1,
           seed = 17, est = "ANNP", ind.true = K2mod$ind.true,
           para.true = K2mod$para.true, n.init = 100)
summary(K2.annp)
plot(K2.annp)

[Package rrMixture version 0.1-2 Index]