mixregRM2 {MixSemiRob}R Documentation

Robust Mixture Regression with Thresholding-Embedded EM Algorithm for Penalized Estimation

Description

A robust mixture regression model that simultaneously conducts outlier detection and robust parameter estimation. It uses a sparse, case-specific, and scale-dependent mean-shift mixture model parameterization (Yu et al., 2017):

f(y_i|\boldsymbol{x}_i,\boldsymbol{\theta},\boldsymbol{\gamma}_i) = \sum_{j=1}^C\pi_j\phi(y_i;\boldsymbol{x}^{\top}\boldsymbol{\beta}_j+\gamma_{ij}\sigma_j,\sigma_j^2),

i=1,\cdots,n, where C is the number of components in the model, \boldsymbol{\theta}=(\pi_1,\boldsymbol{\beta}_1,\sigma_1,..,\pi_{C},\boldsymbol{\beta}_C,\sigma_C)^{\top} is the parameter to estimate, and \boldsymbol{\gamma}_i=(\gamma_{i1},...,\gamma_{iC})^{\top} is a vector of mean-shift parameter for the ith observation.

Usage

mixregRM2(x, y, C = 2, ini = NULL, nstart = 20, tol = 1e-02, maxiter = 50,
          method = c("HARD", "SOFT"), sigma.const = 0.001, lambda = 0.001)

Arguments

x

an n by p data matrix where n is the number of observations and p is the number of explanatory variables. The intercept term will automatically be added to the data.

y

an n-dimensional vector of response variable.

C

number of mixture components. Default is 2.

ini

initial values for the parameters. Default is NULL, which obtains the initial values using the mixreg function. It can be a list with the form of list(pi, beta, sigma, gamma), where pi is a vector of C mixing proportions, beta is a C by (p + 1) matrix for regression coefficients of C components, sigma is a vector of C standard deviations, and gamma is a vector of C mean shift values.

nstart

number of initializations to try. Default is 20.

tol

stopping criteria (threshold value) for the EM algorithm. Default is 1e-02.

maxiter

maximum number of iterations for the EM algorithm. Default is 50.

method

character, determining which threshold method to use: HARD or SOFT. Default is HARD. See details.

sigma.const

constraint on the ratio of minimum and maximum values of sigma. Default is 0.001.

lambda

tuning parameter in the penalty term. It can be found based on BIC. See Yu et al. (2017) for more details.

Details

The parameters are estimated by maximizing the corresponding penalized log-likelihood function using an EM algorithm. The thresholding rule involes the estimation of \gamma_{ij} corresponding to different penalty:

Here, \epsilon_{ij} = (y_i-\boldsymbol{x}_i^{\top}\boldsymbol{\beta_j})/\sigma_j and (\cdot)_{+}=\max(\cdot,0). Also, \lambda_{ij}^* is taken as \lambda/p_{ij}^{(k+1)} for soft threshold and \lambda/\sqrt{p_{ij}^{(k+1)}} for hard threshold.

Value

A list containing the following elements:

pi

C-dimensional vector of estimated mixing proportions.

beta

C by (p + 1) matrix of estimated regression coefficients.

sigma

C-dimensional vector of estimated standard deviations.

gamma

n-dimensional vector of estimated mean shift values.

posterior

n by C matrix of posterior probabilities of each observation belonging to each component.

run

total number of iterations after convergence.

References

Yu, C., Yao, W., and Chen, K. (2017). A new method for robust mixture regression. Canadian Journal of Statistics, 45(1), 77-94.

See Also

mixreg for initial value calculation.

Examples

data(tone)
y = tone$tuned
x = tone$stretchratio
k = 160
x[151:k] = 0
y[151:k] = 5
est_RM2 = mixregRM2(x, y, lambda = 1)

[Package MixSemiRob version 1.1.0 Index]