mixregRM2 {MixSemiRob} | R Documentation |
Robust Mixture Regression with Thresholding-Embedded EM Algorithm for Penalized Estimation
Description
A robust mixture regression model that simultaneously conducts outlier detection and robust parameter estimation. It uses a sparse, case-specific, and scale-dependent mean-shift mixture model parameterization (Yu et al., 2017):
f(y_i|\boldsymbol{x}_i,\boldsymbol{\theta},\boldsymbol{\gamma}_i) = \sum_{j=1}^C\pi_j\phi(y_i;\boldsymbol{x}^{\top}\boldsymbol{\beta}_j+\gamma_{ij}\sigma_j,\sigma_j^2),
i=1,\cdots,n
, where C
is the number of components in the model,
\boldsymbol{\theta}=(\pi_1,\boldsymbol{\beta}_1,\sigma_1,..,\pi_{C},\boldsymbol{\beta}_C,\sigma_C)^{\top}
is the parameter to estimate,
and \boldsymbol{\gamma}_i=(\gamma_{i1},...,\gamma_{iC})^{\top}
is a vector of mean-shift parameter for the ith observation.
Usage
mixregRM2(x, y, C = 2, ini = NULL, nstart = 20, tol = 1e-02, maxiter = 50,
method = c("HARD", "SOFT"), sigma.const = 0.001, lambda = 0.001)
Arguments
x |
an n by p data matrix where n is the number of observations and p is the number of explanatory variables. The intercept term will automatically be added to the data. |
y |
an n-dimensional vector of response variable. |
C |
number of mixture components. Default is 2. |
ini |
initial values for the parameters. Default is NULL, which obtains the initial values
using the |
nstart |
number of initializations to try. Default is 20. |
tol |
stopping criteria (threshold value) for the EM algorithm. Default is 1e-02. |
maxiter |
maximum number of iterations for the EM algorithm. Default is 50. |
method |
character, determining which threshold method to use: |
sigma.const |
constraint on the ratio of minimum and maximum values of sigma. Default is 0.001. |
lambda |
tuning parameter in the penalty term. It can be found based on BIC. See Yu et al. (2017) for more details. |
Details
The parameters are estimated by maximizing the corresponding penalized log-likelihood function using an EM algorithm.
The thresholding rule involes the estimation of \gamma_{ij}
corresponding to different penalty:
Soft threshold:
\hat{\gamma}_{ij} = sgn(\epsilon_{ij})(|\epsilon_{ij}|-\lambda_{ij}^*)_{+})
, corresponding to thel_1
penalty.Hard threshold:
\hat{\gamma}_{ij} = \epsilon_{ij}I(|\epsilon_{ij}|>\lambda_{ij}^*))
, corresponding to thel_0
penalty.
Here, \epsilon_{ij} = (y_i-\boldsymbol{x}_i^{\top}\boldsymbol{\beta_j})/\sigma_j
and
(\cdot)_{+}=\max(\cdot,0)
. Also, \lambda_{ij}^*
is taken as \lambda/p_{ij}^{(k+1)}
for soft threshold and
\lambda/\sqrt{p_{ij}^{(k+1)}}
for hard threshold.
Value
A list containing the following elements:
pi |
C-dimensional vector of estimated mixing proportions. |
beta |
C by (p + 1) matrix of estimated regression coefficients. |
sigma |
C-dimensional vector of estimated standard deviations. |
gamma |
n-dimensional vector of estimated mean shift values. |
posterior |
n by C matrix of posterior probabilities of each observation belonging to each component. |
run |
total number of iterations after convergence. |
References
Yu, C., Yao, W., and Chen, K. (2017). A new method for robust mixture regression. Canadian Journal of Statistics, 45(1), 77-94.
See Also
mixreg
for initial value calculation.
Examples
data(tone)
y = tone$tuned
x = tone$stretchratio
k = 160
x[151:k] = 0
y[151:k] = 5
est_RM2 = mixregRM2(x, y, lambda = 1)