| cwm {flexCWM} | R Documentation |
Fit for the CWM
Description
Maximum likelihood fitting of the cluster-weighted model by the EM algorithm.
Usage
cwm(formulaY = NULL, familyY = gaussian, data, Xnorm = NULL, Xbin = NULL,
Xpois = NULL, Xmult = NULL, modelXnorm = NULL, Xbtrials = NULL, k = 1:3,
initialization = c("random.soft", "random.hard", "kmeans", "mclust", "manual"),
start.z = NULL, seed = NULL, maxR = 1, iter.max = 1000, threshold = 1.0e-04,
eps = 1e-100, parallel = FALSE, pwarning = FALSE)
Arguments
formulaY |
an optional object of class " |
familyY |
a description of the error distribution and link function to be used for the conditional distribution of
Default value is |
data |
an optional |
Xnorm, Xbin, Xpois, Xmult |
an optional matrix containing variables to be used for marginalization having normal, binomial, Poisson and multinomial distributions. |
modelXnorm |
an optional vector of character strings indicating the parsimonious models to be fitted for variables in |
Xbtrials |
an optional vector containing the number of trials for each column in |
k |
an optional vector containing the numbers of mixture components to be tried. Default value is |
initialization |
an optional character string. It sets the initialization strategy for the EM-algorithm. It can be:
Default value is |
start.z |
matrix of soft or hard classification: it is used only if |
seed |
an optional scalar. It sets the seed for the random number generator, when random initializations are used; if |
maxR |
number of initializations to be tried. Default value is 1. |
iter.max |
an optional scalar. It sets the maximum number of iterations in the EM-algorithm. Default value is 200. |
threshold |
an optional scalar. It sets the threshold for the Aitken acceleration procedure. Default value is 1.0e-04. |
eps |
an optional scalar. It sets the smallest value for eigenvalues of covariance matrices for |
parallel |
When |
pwarning |
When |
Details
When familyY = binomial, the response variable must be a matrix with two columns, where the first column is the number of "successes" and the second column is the number of "failures".
When several models have been estimated, methods summary and print consider the best model according to the information criterion in criterion, among the estimated models having a number of components among those in k an error distribution among those in familyY and a parsimonious model among those in modelXnorm.
Value
This function returns a class cwm object, which is a list of values related to the model selected. It contains:
call |
an object of class |
formulaY |
an object of class |
familyY |
the distribution used for the conditional distribution of |
data |
a |
concomitant |
a list containing |
Xbtrials |
number of trials used for |
models |
a list; each element is related to one of the models fitted. Each element is a list and contains: |
posteriorposterior probabilitiesiternumber of iterations performed in EM algorithmknumber of (fitted) mixture components.sizeestimated size of the groups.clusterclassification vectorloglikfinal log-likelihood valuedfoverall number of estimated parameterspriorweights for the mixture componentsIClist containing values of the information criteriaconvergedlogical;TRUEif EM algorithm convergedGLModelsa list; each element is related to a mixture component and contains:modela "glm" class object.sigmaestimated local scale parameters of the conditional distribution ofY, whenfamilyYisgaussianorstudent.tt_dfestimated degrees of freedom of the t distribution, whenfamilyYisstudent.tnuYestimated shape parameter, whenfamilyYisGamma. The gamma distribution is parameterized according to McCullagh & Nelder (1989, p. 30)
concomitanta list with estimated concomitant variables parameters for each mixture componentnormal.d, multinomial.d, poisson.d, binomial.dmarginal distribution of concomitant variablesnormal.mumixture component means forXnormnormal.Sigmamixture component covariance matrices forXnormnormal.modelmodels fitted forXnormmultinomial.probsmultinomial distribution probabilities forXmultpoisson.lambdalambda parameters forXpoisbinomial.pbinomial probabilities forXbin
Author(s)
Mazza A., Punzo A., Ingrassia S.
References
Mazza, A., Ingrassia, S., and Punzo, A. (2018). flexCWM: A Flexible Framework for Cluster-Weighted Models. Journal of Statistical Software, 86(2), 1-30.
Ingrassia, S., Minotti, S. C., and Vittadini, G. (2012). Local Statistical Modeling via the Cluster-Weighted Approach with Elliptical Distributions. Journal of Classification, 29(3), 363-401.
Ingrassia, S., Minotti, S. C., and Punzo, A. (2014). Model-based clustering via linear cluster-weighted models. Computational Statistics and Data Analysis, 71, 159-182.
Ingrassia, S., Punzo, A., and Vittadini, G. (2015). The Generalized Linear Mixed Cluster-Weighted Model. Journal of Classification, 32(forthcoming)
McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall, Boca Raton, 2nd edition
Punzo, A. (2014). Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model. Statistical Modelling, 14(3), 257-291.
See Also
Examples
## an exemple with artificial data
data("ExCWM")
attach(ExCWM)
str(ExCWM)
# mixtures of binomial distributions
resXbin <- cwm(Xbin = Xbin, k = 1:2, initialization = "kmeans")
getParXbin(resXbin)
# Mixtures of Poisson distributions
resXpois <- cwm(Xpois = Xpois, k = 1:2, initialization = "kmeans")
getParXpois(resXpois)
# parsimonious mixtures of multivariate normal distributions
resXnorm <- cwm(Xnorm = cbind(Xnorm1,Xnorm2), k = 1:2, initialization = "kmeans")
getParXnorm(resXnorm)
## an exemple with real data
data("students")
attach(students)
str(students)
# CWM
fit2 <- cwm(WEIGHT ~ HEIGHT + HEIGHT.F , Xnorm = cbind(HEIGHT, HEIGHT.F),
k = 2, initialization = "kmeans", modelXnorm = "EEE")
summary(fit2, concomitant = TRUE)
plot(fit2)