| predict.MoEClust {MoEClust} | R Documentation | 
Predictions for MoEClust models
Description
Predicts both cluster membership probabilities and fitted response values from a MoEClust model, using covariates and response data, or covariates only. The predicted MAP classification, mixing proportions, and component means are all also reported in both cases, as well as the predictions of the expert network corresponding to the most probable component.
Usage
## S3 method for class 'MoEClust'
predict(object,
        newdata,
        resid = FALSE,
        discard.noise = FALSE,
        MAPresids = FALSE,
        use.y = TRUE,
        ...)
## S3 method for class 'MoEClust'
fitted(object,
       ...)
## S3 method for class 'MoEClust'
residuals(object,
          newdata,
          ...)
Arguments
| object | An object of class  | 
| newdata | A list with two named components, each of which must be a  
 If supplied as a list with elements  Alternatively, a single  When  | 
| resid | A logical indicating whether to return the residuals also. Defaults to  | 
| discard.noise | A logical governing how predictions of the responses are made for models with a noise component (otherwise this argument is irrelevant). By default ( | 
| MAPresids | A logical indicating whether residuals are computed against  | 
| use.y | A logical indicating whether the response variables (if any are supplied either via  | 
| ... | Catches unused arguments (and allows the  | 
Details
Predictions can also be made for models with a noise component, in which case z will include the probability of belonging to "Cluster0" & classification will include labels with the value 0 for observations classified as noise (if any). The argument discard.noise governs how the responses are predicted in the presence of a noise component (see noise_vol for more details).
Note that the argument discard.noise is invoked for any models with a noise component, while the similar MoE_control argument noise.args$discard.noise is only invoked for models with both a noise component and expert network covariates.
Please be aware that a model considered optimal from a clustering point of view may not necessarily be optimal from a prediction point of view. In particular, full MoE models with covariates in both networks (for which both the cluster membership probabilities and component means are observation-specific) are recommended for out-of-sample prediction when only new covariates are observed (see new.x and new.y above, as well as use.y).
Value
A list with the following named components, regardless of whether newdata$new.x and newdata$new.y were used, or newdata$new.x only.
| y | Aggregated fitted values of the response variables. | 
| z | A matrix whose  | 
| classification | The vector of predicted cluster labels for the  | 
| pro | The predicted mixing proportions for the  | 
| mean | The predicted component means for the  | 
| MAPy | Fitted values of the single expert network to which each observation is most probably assigned. Not returned for models with equal mixing proportions when only  | 
When residuals is called, only the residuals (governed by MAPresids) are returned; when predict is called with resid=TRUE, the list above will also contain the element resids, containing the residuals.
The returned values of pro and mean are always the same, regardless of whether newdata$new.x and newdata$new.y were used, or newdata$new.x only.
Finally, fitted is simply a wrapper to predict.MoEClust(object)$y without any newdata, and with the resid and MAPresids arguments also ignored.
Note
Note that a dedicated predict function is also provided for objects of class "MoE_gating" (typically object$gating, where object is of class "MoEClust"). This function is effectively a shortcut to predict(object, ...)$pro, which (unlike the predict method for multinom on which it is based) accounts for the various ways of treating gating covariates and noise components, although its type argument defaults to "probs" rather than "class". Notably, its keep.noise argument behaves differently from the discard.noise argument here; here, the noise component is only discarded in the computation of the predicted responses. See predict.MoE_gating for further details.
Similarly, a dedicated predict function is also provided for objects of class "MoE_expert" (typically object$expert, where object is of class "MoE_expert"). This function is effectively a wrapper to predict(object, ...)$mean, albeit it returns a list (by default) rather than a 3-dimensional array and also always preserves the dimensions of newdata, even for models without expert network covariates. See predict.MoE_expert for further details.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.
See Also
MoE_clust, MoE_control, noise_vol, predict.MoE_gating, predict.MoE_expert
Examples
data(ais)
# Fit a MoEClust model and predict the same data
res     <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
                     modelNames="EVE", network.data=ais)
pred1   <- predict(res)
# Get only the fitted responses
fits    <- fitted(res)
all.equal(pred1$y, fits) #TRUE
# Remove some rows of the data for prediction purposes
ind     <- sample(1:nrow(ais), 5)
dat     <- ais[-ind,]
# Fit another MoEClust model to the retained data
res2    <- MoE_clust(dat[,3:7], G=3, gating= ~ BMI + sex,
                     modelNames="EEE", network.data=dat)
# Predict held back data using the covariates & response variables
(pred2  <- predict(res2, newdata=ais[ind,]))
# pred2 <- predict(res2, newdata=list(new.y=ais[ind,3:7],
#                                     new.x=ais[ind,c("BMI", "sex")]))
# Get the residuals
residuals(res2, newdata=ais[ind,])
# Predict held back data using only the covariates
(pred3  <- predict(res2, newdata=ais[ind,], use.y=FALSE))
# pred3 <- predict(res2, newdata=list(new.x=ais[ind,c("BMI", "sex")]))
# pred3 <- predict(res2, newdata=ais[ind,c("BMI", "sex")])