cluspred {ClusPred}R Documentation

Function used for clustering and fitting the regression model

Description

Estimation of the group-variable Z based on covariates X and estimation of the parameters of the regression of Y on (U, Z)

Usage

cluspred(
  y,
  x,
  u = NULL,
  K = 2,
  model.reg = "mean",
  tau = 0.5,
  simultaneous = TRUE,
  np = TRUE,
  nbinit = 20,
  nbCPU = 1,
  tol = 0.01,
  band = (length(y)^(-1/5)),
  seed = 134
)

Arguments

y

numeric vector of the traget variable (must be numerical)

x

matrix used for clustering (can contain numerical and factors)

u

matrix of the covariates used for regression (can contain numerical and factors)

K

number of clusters

model.reg

indicates the type of the loss ("mean", "quantile", "expectile", "logcosh", "huber"). Only the losses "mean" and "quantile" are implemented if simultaneous=FALSE or np=FALSE

tau

specifies the level for the loss (quantile, expectile or huber)

simultaneous

oolean indicating whether the clustering and the regression are performed simultaneously (TRUE) or not (FALSE)

np

boolean indicating whether nonparameteric model is used (TRUE) or not (FALSE)

nbinit

number of random initializations

nbCPU

number of CPU only used for linux

tol

to specify the stopping rule

band

bandwidth selection

seed

value of the seed (used for drawing the starting points)

Value

cluspred returns a list containing the model parameters (param), the posterior probabilities of cluster memberships (tik), the partition (zhat) and the (smoothed) loglikelihood)

References

Simultaneous semi-parametric estimation of clustering and regression, Matthieu Marbac and Mohammed Sedki and Christophe Biernacki and Vincent Vandewalle (2020) <arXiv:2012.14159>.

Examples

require(ClusPred)
# data loading
data(simdata)

# mean regression with two latent groups in parametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
 np=FALSE, nbCPU = 1, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# loglikelihood
res$loglike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = FALSE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat


# median regression with two latent groups in nonparametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
model.reg = "quantile", tau = 0.5, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# smoothed loglikelihood
res$logSmoothlike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = TRUE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat




[Package ClusPred version 1.1.0 Index]