cluspred {ClusPred} | R Documentation |
Function used for clustering and fitting the regression model
Description
Estimation of the group-variable Z based on covariates X and estimation of the parameters of the regression of Y on (U, Z)
Usage
cluspred(
y,
x,
u = NULL,
K = 2,
model.reg = "mean",
tau = 0.5,
simultaneous = TRUE,
np = TRUE,
nbinit = 20,
nbCPU = 1,
tol = 0.01,
band = (length(y)^(-1/5)),
seed = 134
)
Arguments
y |
numeric vector of the traget variable (must be numerical) |
x |
matrix used for clustering (can contain numerical and factors) |
u |
matrix of the covariates used for regression (can contain numerical and factors) |
K |
number of clusters |
model.reg |
indicates the type of the loss ("mean", "quantile", "expectile", "logcosh", "huber"). Only the losses "mean" and "quantile" are implemented if simultaneous=FALSE or np=FALSE |
tau |
specifies the level for the loss (quantile, expectile or huber) |
simultaneous |
oolean indicating whether the clustering and the regression are performed simultaneously (TRUE) or not (FALSE) |
np |
boolean indicating whether nonparameteric model is used (TRUE) or not (FALSE) |
nbinit |
number of random initializations |
nbCPU |
number of CPU only used for linux |
tol |
to specify the stopping rule |
band |
bandwidth selection |
seed |
value of the seed (used for drawing the starting points) |
Value
cluspred returns a list containing the model parameters (param), the posterior probabilities of cluster memberships (tik), the partition (zhat) and the (smoothed) loglikelihood)
References
Simultaneous semi-parametric estimation of clustering and regression, Matthieu Marbac and Mohammed Sedki and Christophe Biernacki and Vincent Vandewalle (2020) <arXiv:2012.14159>.
Examples
require(ClusPred)
# data loading
data(simdata)
# mean regression with two latent groups in parametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
np=FALSE, nbCPU = 1, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# loglikelihood
res$loglike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = FALSE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat
# median regression with two latent groups in nonparametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
model.reg = "quantile", tau = 0.5, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# smoothed loglikelihood
res$logSmoothlike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = TRUE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat