rrmix {rrMixture}R Documentation

Reduced-Rank Mixture Models in Multivariate Regression

Description

‘rrmix’ is used to estimate parameters of reduced-rank mixture models in multivariate linear regression using the full-ranked, rank-penalized, and adaptive nuclear norm penalized estimators proposed by Kang et. al. (2022+).

Usage

rrmix(K = 2, X, Y, est = c("FR", "RP", "ANNP"),
      lambda = 0, gamma = 2, ind0 = NULL, para0 = NULL, seed = NULL,
      kmscale = FALSE, km.nstart = 20, n.init = 100, commonvar = FALSE,
      maxiter = 1000, maxiter.int = 100, thres = 1e-05, thres.int = 1e-05,
      visible = FALSE, para.true = NULL, ind.true = NULL)

Arguments

K

number of mixture components.

X

n by p design matrix where n is the number of observations and p is the number of predictors.

Y

n by q response matrix where n is the number of observations and q is the number of responses.

est

character, specifying the estimation method. ‘FR’, ‘RP’, and ‘ANNP’ refers to as the full-ranked, rank-penalized, and adaptive nuclear norm penalized method, respectively.

lambda

numerical value, specifying tuning parameter. Only used in the estimation method of ‘RP’ and ‘ANNP’. If 0, all estimation methods (‘FR’, ‘RP’, and ‘ANNP’) provide the same estimation results.

gamma

numerical value, specifying additional tuning parameter, only used in the estimation method of ‘ANNP’. It must be nonnegative.

ind0

vector of length n, specifying the initial assignment of the mixture membership of n observations when there is prior information on the membership. If ‘NULL’, K-means clustering technique is used to assign the membership for n observations. Default is ‘NULL’.

para0

array of length K. It consists of K lists, each of which contains initial values of membership probability, coefficient matrix, and variance- covariance matrix.

seed

seed number for the reproducibility of initialization results in the EM algorithm. Default is ‘NULL’.

kmscale

logical value, indicating whether Y is scaled prior to K-means clustering for initialization. Default is ‘FALSE’.

km.nstart

number of random sets considered to perform K-means clustering for initialization. Default is 20.

n.init

number of initializations to try. Two methods for initial clustering are used: K-means and random clustering.

commonvar

logical value, indicating the homogeneity assumption of variance-covariance matrices across K mixture components. Default is ‘FALSE’.

maxiter

maximum number of iterations for external iterative algorithm, used in all estimation methods.

maxiter.int

maximum number of iterations for internal iterative algorithm, only used in the estimation method of ‘ANNP’.

thres

threshold value for external EM algorithm, used in all estimation methods. It controls the termination of the EM algorithm.

thres.int

threshold value for internal iterative algorithm, only used in the estimation method of ‘ANNP’. It controls the termination of the internal algorithm.

visible

logical value, indicating whether the outputs from each iteration are printed. Useful when the whole algorithm takes long. Default is ‘FALSE’.

para.true

array of length K. It consists of K lists, each of which contains a coefficient matrix and its true rank. Only used when true models are known, e.g., in a simulation study.

ind.true

vector of length n, specifying the true mixture membership for n observations. Only used when true models are known, e.g., in a simulation study.

Value

An object of class rrmix containing the fitted model, including:

call

original function call.

seed

seed number which is set for the initilization.

n.est

vector of length K, specifying the estimated number of observations in each mixture components.

para

array of length K. It consists of K lists, each of which contains final estimates of membership probability, coefficient matrix, and variance- covariance matrix.

est.rank

vector of length K, specifying the estimated ranks of coefficient matrices.

npar

number of parameters in the model, used to estimate the BIC.

n.iter

number of iterations (external EM algorithm).

lambda

tuning parameter for the estimation method of 'RP' or 'ANNP'.

gamma

tuning parameter for the estimation method of 'ANNP'.

ind

vector of length n, specifying the estimated mixture membership for n observations.

ind.true

vector of length n, specifying the true mixture membership for n observations. Only returned when the true models are known.

loglik

log-likelihood of the final model.

penloglik

penalized log-likelihood of the final model.

penalty

penalty in the penalized log-likelihood of the final model.

bic

BIC of the final model.

avg.nn.iter

average number of iterations for internal iterative algorithm, only returned for the estimation method of 'ANNP'.

resmat

matrix containing the information for each iteration of the EM algorithm, e.g., iteration number, log-likelihood, penalized log- likelihood, difference between penalized log-likelihood values from two consecutive iterations, and computing time.

class.err

Soft and hard classification errors for mixture membership. Only returned when the true models are known.

est.err

estimation error from the comparison between the estimated and true coefficient matrices. Only returned when the true models are known.

pred.err

prediction error. Only returned when the true models are known.

Author(s)

Suyeon Kang, University of California, Riverside, skang062@ucr.edu; Weixin Yao, University of California, Riverside, weixin.yao@ucr.edu; Kun Chen, University of Connecticut, kun.chen@uconn.edu.

References

Kang, S., Chen, K., and Yao, W. (2022+). "Reduced rank estimation in mixtures of multivariate linear regression".

See Also

rrmix.sim.norm, initialize.para

Examples

library(rrMixture)

#-----------------------------------------------------------#
# Real Data Example: Tuna Data
#-----------------------------------------------------------#
require(bayesm)
data(tuna)
tunaY <- log(tuna[, c("MOVE1", "MOVE2", "MOVE3", "MOVE4",
                  "MOVE5", "MOVE6", "MOVE7")])
tunaX <- tuna[, c("NSALE1", "NSALE2", "NSALE3", "NSALE4",
              "NSALE5", "NSALE6", "NSALE7",
              "LPRICE1", "LPRICE2", "LPRICE3", "LPRICE4",
              "LPRICE5", "LPRICE6", "LPRICE7")]
tunaX <- cbind(intercept = 1, tunaX)

# Rank-penalized estimation

tuna.rp <- rrmix(K = 2, X = tunaX, Y = tunaY, lambda = 3, est = "RP",
           seed = 100, n.init = 100)
summary(tuna.rp)
plot(tuna.rp) 

# Adaptive nuclear norm penalized estimation

tuna.annp <- rrmix(K = 2, X = tunaX, Y = tunaY, lambda = 3, gamma = 2, est = "ANNP",
             seed = 100, n.init = 100)
summary(tuna.annp)
plot(tuna.annp)       

#-----------------------------------------------------------#
# Simulation: Two Components Case
#-----------------------------------------------------------#
# Simulation Data
K2mod <- rrmix.sim.norm(K = 2, n = 100, p = 5, q = 5, rho = .5,
         b = 1, shift = 1, r.star = c(1, 3), sigma = c(1, 1),
         pr = c(.5, .5), seed = 1215)
         
# Rank-penalized estimation

K2.rp <- rrmix(K = 2, X = K2mod$X, Y = K2mod$Y, lambda = 1,
         seed = 17, est = "RP", ind.true = K2mod$ind.true,
         para.true = K2mod$para.true, n.init = 100)
summary(K2.rp)
plot(K2.rp)
         
# Adaptive nuclear norm penalized estimation

K2.annp <- rrmix(K = 2, X = K2mod$X, Y = K2mod$Y, lambda = 1,
           seed = 17, est = "ANNP", ind.true = K2mod$ind.true,
           para.true = K2mod$para.true, n.init = 100)
summary(K2.annp)
plot(K2.annp)

[Package rrMixture version 0.1-2 Index]