semimrOne {MixSemiRob}R Documentation

Semiparametric Mixture Regression Models with Single-index and One-step Backfitting

Description

Assume that \boldsymbol{x} = (\boldsymbol{x}_1,\cdots,\boldsymbol{x}_n) is an n by p matrix and Y = (Y_1,\cdots,Y_n) is an n-dimensional vector of response variable. The conditional distribution of Y given \boldsymbol{x} can be written as:

f(y|\boldsymbol{x},\boldsymbol{\alpha},\pi,m,\sigma^2) = \sum_{j=1}^C\pi_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}) \phi(y|m_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}),\sigma_j^2(\boldsymbol{\alpha}^{\top}\boldsymbol{x})).

‘semimrFull’ is used to estimate the mixture of single-index models described above, where \phi(y|m_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}),\sigma_j^2(\boldsymbol{\alpha}^{\top}\boldsymbol{x})) represents the normal density with a mean of m_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}) and a variance of \sigma_j^2(\boldsymbol{\alpha}^{\top}\boldsymbol{x}), and \pi_j(\cdot), \mu_j(\cdot), \sigma_j^2(\cdot) are unknown smoothing single-index functions capable of handling high-dimensional non-parametric problem. This function employs kernel regression and a one-step estimation procedure (Xiang and Yao, 2020).

Usage

semimrOne(x, y, h, coef = NULL, ini = NULL, grid = NULL)

Arguments

x

an n by p matrix of observations where n is the number of observations and p is the number of explanatory variables.

y

a vector of response values.

h

bandwidth for the kernel regression. Default is NULL, and the bandwidth is computed in the function by cross-validation.

coef

initial value of \boldsymbol{\alpha}^{\top} in the model, which plays a role of regression coefficient in a regression model. Default is NULL, and the value is computed in the function by sliced inverse regression (Li, 1991).

ini

initial values for the parameters. Default is NULL, which obtains the initial values, assuming a linear mixture model. If specified, it can be a list with the form of list(pi, mu, var), where pi is a vector of mixing proportions, mu is a vector of component means, and var is a vector of component variances.

grid

grid points at which nonparametric functions are estimated. Default is NULL, which uses the estimated mixing proportions, component means, and component variances as the grid points after the algorithm converges.

Value

A list containing the following elements:

pi

estimated mixing proportions.

mu

estimated component means.

var

estimated component variances.

coef

estimated regression coefficients.

References

Xiang, S. and Yao, W. (2020). Semiparametric mixtures of regressions with single-index for model based clustering. Advances in Data Analysis and Classification, 14(2), 261-292.

Li, K. C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414), 316-327.

See Also

semimrFull, sinvreg for initial value calculation of \boldsymbol{\alpha}^{\top}.

Examples

xx = NBA[, c(1, 2, 4)]
yy = NBA[, 3]
x = xx/t(matrix(rep(sqrt(diag(var(xx))), length(yy)), nrow = 3))
y = yy/sd(yy)
ini_bs = sinvreg(x, y)
ini_b = ini_bs$direction[, 1]

# used a smaller sample for a quicker demonstration of the function
set.seed(123)
est_onestep = semimrOne(x[1:50, ], y[1:50], h = 0.3442, coef = ini_b)

[Package MixSemiRob version 1.1.0 Index]