R: Robust Mixture Regression with Thresholding-Embedded EM...

mixregRM2 {MixSemiRob}

R Documentation

Robust Mixture Regression with Thresholding-Embedded EM Algorithm for Penalized Estimation

Description

A robust mixture regression model that simultaneously conducts outlier detection and robust parameter estimation. It uses a sparse, case-specific, and scale-dependent mean-shift mixture model parameterization (Yu et al., 2017):

f(y_i|\boldsymbol{x}_i,\boldsymbol{\theta},\boldsymbol{\gamma}_i) = \sum_{j=1}^C\pi_j\phi(y_i;\boldsymbol{x}^{\top}\boldsymbol{\beta}_j+\gamma_{ij}\sigma_j,\sigma_j^2),

i=1,\cdots,n, where C is the number of components in the model, \boldsymbol{\theta}=(\pi_1,\boldsymbol{\beta}_1,\sigma_1,..,\pi_{C},\boldsymbol{\beta}_C,\sigma_C)^{\top} is the parameter to estimate, and \boldsymbol{\gamma}_i=(\gamma_{i1},...,\gamma_{iC})^{\top} is a vector of mean-shift parameter for the ith observation.

Usage

mixregRM2(x, y, C = 2, ini = NULL, nstart = 20, tol = 1e-02, maxiter = 50,
          method = c("HARD", "SOFT"), sigma.const = 0.001, lambda = 0.001)

Arguments

`x`	an n by p data matrix where n is the number of observations and p is the number of explanatory variables. The intercept term will automatically be added to the data.
`y`	an n-dimensional vector of response variable.
`C`	number of mixture components. Default is 2.
`ini`	initial values for the parameters. Default is NULL, which obtains the initial values using the `mixreg` function. It can be a list with the form of `list(pi, beta, sigma, gamma)`, where `pi` is a vector of C mixing proportions, `beta` is a C by (p + 1) matrix for regression coefficients of C components, `sigma` is a vector of C standard deviations, and `gamma` is a vector of C mean shift values.
`nstart`	number of initializations to try. Default is 20.
`tol`	stopping criteria (threshold value) for the EM algorithm. Default is 1e-02.
`maxiter`	maximum number of iterations for the EM algorithm. Default is 50.
`method`	character, determining which threshold method to use: `HARD` or `SOFT`. Default is `HARD`. See details.
`sigma.const`	constraint on the ratio of minimum and maximum values of sigma. Default is 0.001.
`lambda`	tuning parameter in the penalty term. It can be found based on BIC. See Yu et al. (2017) for more details.

Details

The parameters are estimated by maximizing the corresponding penalized log-likelihood function using an EM algorithm. The thresholding rule involes the estimation of \gamma_{ij} corresponding to different penalty:

Soft threshold: \hat{\gamma}_{ij} = sgn(\epsilon_{ij})(|\epsilon_{ij}|-\lambda_{ij}^*)_{+}), corresponding to the l_1 penalty.
Hard threshold: \hat{\gamma}_{ij} = \epsilon_{ij}I(|\epsilon_{ij}|>\lambda_{ij}^*)), corresponding to the l_0 penalty.

Here, \epsilon_{ij} = (y_i-\boldsymbol{x}_i^{\top}\boldsymbol{\beta_j})/\sigma_j and (\cdot)_{+}=\max(\cdot,0). Also, \lambda_{ij}^* is taken as \lambda/p_{ij}^{(k+1)} for soft threshold and \lambda/\sqrt{p_{ij}^{(k+1)}} for hard threshold.

Value

A list containing the following elements:

`pi`	C-dimensional vector of estimated mixing proportions.
`beta`	C by (p + 1) matrix of estimated regression coefficients.
`sigma`	C-dimensional vector of estimated standard deviations.
`gamma`	n-dimensional vector of estimated mean shift values.
`posterior`	n by C matrix of posterior probabilities of each observation belonging to each component.
`run`	total number of iterations after convergence.

References

Yu, C., Yao, W., and Chen, K. (2017). A new method for robust mixture regression. Canadian Journal of Statistics, 45(1), 77-94.

Examples

data(tone)
y = tone$tuned
x = tone$stretchratio
k = 160
x[151:k] = 0
y[151:k] = 5
est_RM2 = mixregRM2(x, y, lambda = 1)

[Package MixSemiRob version 1.1.0 Index]