semimrFull {MixSemiRob}R Documentation

Semiparametric Mixture Regression Models with Single-index Proportion and Fully Iterative Backfitting

Description

Assume that \boldsymbol{x} = (\boldsymbol{x}_1,\cdots,\boldsymbol{x}_n) is an n by p matrix and Y = (Y_1,\cdots,Y_n) is an n-dimensional vector of response variable. The conditional distribution of Y given \boldsymbol{x} can be written as:

f(y|\boldsymbol{x},\boldsymbol{\alpha},\pi,m,\sigma^2) = \sum_{j=1}^C\pi_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}) \phi(y|m_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}),\sigma_j^2(\boldsymbol{\alpha}^{\top}\boldsymbol{x})).

‘semimrFull’ is used to estimate the mixture of single-index models described above, where \phi(y|m_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}),\sigma_j^2(\boldsymbol{\alpha}^{\top}\boldsymbol{x})) represents the normal density with a mean of m_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}) and a variance of \sigma_j^2(\boldsymbol{\alpha}^{\top}\boldsymbol{x}), and \pi_j(\cdot), \mu_j(\cdot), \sigma_j^2(\cdot) are unknown smoothing single-index functions capable of handling high-dimensional non-parametric problem. This function employs kernel regression and a fully iterative backfitting (FIB) estimation procedure (Xiang and Yao, 2020).

Usage

semimrFull(x, y, h = NULL, coef = NULL, ini = NULL, grid = NULL, maxiter = 100)

Arguments

x

an n by p matrix of observations where n is the number of observations and p is the number of explanatory variables.

y

an n-dimensional vector of response values.

h

bandwidth for the kernel regression. Default is NULL, and the bandwidth is computed in the function by cross-validation.

coef

initial value of \boldsymbol{\alpha}^{\top} in the model, which plays a role of regression coefficient in a regression model. Default is NULL, and the value is computed in the function by sliced inverse regression (Li, 1991).

ini

initial values for the parameters. Default is NULL, which obtains the initial values, assuming a linear mixture model. If specified, it can be a list with the form of list(pi, mu, var), where pi is a vector of mixing proportions, mu is a vector of component means, and var is a vector of component variances.

grid

grid points at which nonparametric functions are estimated. Default is NULL, which uses the estimated mixing proportions, component means, and component variances as the grid points after the algorithm converges.

maxiter

maximum number of iterations. Default is 100.

Value

A list containing the following elements:

pi

matrix of estimated mixing proportions.

mu

estimated component means.

var

estimated component variances.

coef

estimated regression coefficients.

run

total number of iterations after convergence.

References

Xiang, S. and Yao, W. (2020). Semiparametric mixtures of regressions with single-index for model based clustering. Advances in Data Analysis and Classification, 14(2), 261-292.

Li, K. C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414), 316-327.

See Also

semimrOne, sinvreg for initial value calculation of \boldsymbol{\alpha}^{\top}.

Examples

xx = NBA[, c(1, 2, 4)]
yy = NBA[, 3]
x = xx/t(matrix(rep(sqrt(diag(var(xx))), length(yy)), nrow = 3))
y = yy/sd(yy)
ini_bs = sinvreg(x, y)
ini_b = ini_bs$direction[, 1]
est = semimrFull(x[1:50, ], y[1:50], h = 0.3442, coef = ini_b)

[Package MixSemiRob version 1.1.0 Index]