pattnpml.fit {prefmod} | R Documentation |
NPML estimation for paired comparison models
Description
Fits a mixture model to overdispersed paired comparison data using non-parametric maximum likelihood (Aitkin, 1996a).
Usage
pattnpml.fit(formula, random = ~1, k = 1, design,
tol = 0.5, startp = NULL, EMmaxit = 500, EMdev.change = 0.001,
seed = NULL, pr.it = FALSE)
Arguments
formula |
A formula defining the response (the count of the number of cases of each pattern) and the fixed effects (e.g. |
random |
A formula defining the random model. If there are three objects labelled o1, o2, o3, set |
k |
The number of mass points (latent classes). Up to 21 mass points are supported. |
design |
The design data frame for paired comparison data as generated using |
tol |
The |
startp |
Optional numerical vector of length |
EMmaxit |
The maximum number of EM iterations. |
EMdev.change |
Stops EM algorithm when deviance change falls below this value. |
seed |
Seed for random weights. If |
pr.it |
A dot is printed at each iteration cycle of the EM algorithm if set to |
Details
The function pattnpml.fit
is a wrapper function for alldistPC
which in turn is a modified version of the function alldist
from the npmlreg package.
The non-parametric maximum likelihood (NPML) approach was introduced in Aitkin (1996) as a tool to fit overdispersed generalised linear models. The idea is to approximate the unknown and unspecified distribution of the random effect by a discrete mixture of exponential family densities, leading to a simple expression of the marginal likelihood which can then be maximised using a standard EM algorithm.
This function extends the NPML approach to allow fitting of overdispersed paired comparison models. It assumes that overdispersion arises because of dependence in the patterns. Fitting a non-parametric random effects term is equivalent to specifying distinct latent classes of response patterns.
The number of components k
of the finite mixture has to be specified beforehand.
The EM algorithm used by the function takes the Gauss-Hermite masses and mass points as starting points.
The position of the starting points can be concentrated or extended by setting tol
smaller or larger, respectively; the initial mass point probabilities of the starting points can also be specified through startp
.
Fitting models for overdispersion can be achieved by specifying the paired comparison items as additive terms in the random part of the model formula. A separate estimate for each item and for each mass point is produced.
Fitting subject covariate models with the same effect for each mass point component is achieved by specifying as part of the formula
a) a subject factor giving a different estimate for each covariate combination b) an interaction of the chosen subject covariates with the objects.
For models with subject factor covariates only, the first term is simply the interaction of all of the factor covariates.
Fitting subject covariate models with a different effect for each mass point component (sometimes called random coefficient models, see Aitkin, Francis, Hinde and Darnell, 2009, pp. 497) is possible by specifying an interaction of the subject covariates with the items in the random
term, and also in the formula
part.
Thus the setting random = ~x:(o1+o2+o3
gives a model with a set of random slopes (one set for each mass point) and a set of random intercepts, one set for each mass point.
The AIC
and BIC
functions from the stats-package can be used.
Value
The function produces an object of class pattNPML
.
The object contains the following 29 components:
coefficients |
a named vector of coefficients (including the mass points). In case of Gaussian quadrature, the coefficient given at |
residuals |
the difference between the true response and the empirical Bayes predictions. |
fitted.values |
the empirical Bayes predictions (Aitkin, 1996b) on the scale of the responses. |
family |
the ‘family’ object used. |
linear.predictors |
the extended linear predictors |
disparity |
the disparity ( |
deviance |
the deviance of the fitted mixture regression model. |
null.deviance |
The deviance for the null model (just containing an intercept), comparable with ‘deviance.’ |
df.residual |
the residual degrees of freedom of the fitted model (including the random part). |
df.null |
the residual degrees of freedom for the null model. |
y |
the (extended) response vector. |
call |
the matched call. |
formula |
the formula supplied. |
random |
the random term of the model formula. |
data |
the data argument. |
model |
the (extended) design matrix. |
weights |
the case weights initially supplied. |
offset |
the offset initially supplied. |
mass.points |
the fitted mass points. |
masses |
the mass point probabilities corresponding to the patterns. |
sdev |
a list of the two elements |
shape |
a list of the two elements |
rsdev |
estimated random effect standard deviation. |
post.prob |
a matrix of posteriori probabilities. |
post.int |
a vector of ‘posteriori intercepts’ (as in Sofroniou et al. (2006)). |
ebp |
the empirical Bayes Predictions on the scale of the linear predictor. For compatibility with older versions. |
EMiter |
gives the number of iterations of the EM algorithm. |
EMconverged |
logical value indicating if the EM algorithm converged. |
lastglm |
the fitted |
Misc |
contains additional information relevant for the summary and plot functions, in particular the disparity trend and the EM trajectories. |
For further details see the help file for function alldist
in package npmlreg.
Note
The mass point probabilities given in the output are the proportion of patterns estimated to contribute to each mass point. To estimate the proportion of cases contributing to each mass point the posterior probabilities need to be averaged over patterns with observed counts as weights (see example below).
Author(s)
Originally translated from the GLIM 4 functions alldist
and allvc
(Aitkin & Francis, 1995) to R by Ross Darnell (2002).
Modified, extended, and prepared for publication by Jochen Einbeck and John Hinde (2006).
Adapted for paired comparison modelling by Reinhold Hatzinger and Brian Francis (2009).
References
Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing, 6(3), 251–262. doi:10.1007/BF00140869
Aitkin, M., Francis, B., Hinde, J., & Darnell, R. (2009). Statistical Modelling in R. Oxford: Oxford University Press.
Einbeck, J., & Hinde, J. (2006). A Note on NPML Estimation for Exponential Family Regression Models with Unspecified Dispersion Parameter. Austrian Journal of Statistics, 35(2&3), 233–243.
Sofroniou, N., Einbeck, J., & Hinde, J. (2006). Analyzing Irish suicide rates with mixture models. Proceedings of the 21st International Workshop on Statistical Modelling in Galway, Ireland, 2006.
See Also
Examples
# two latent classes for paired comparison data
dfr <- patt.design(dat4, 4)
modPC <- pattnpml.fit(y ~ 1, random = ~o1 + o2 + o3, k = 2, design = dfr)
modPC
# estimated proportion of cases in each mixture component
apply(modPC$post.prob, 2, function(x){ sum(x * dfr$y / sum(dfr$y)) })
## Not run:
# fitting a model for two latent classes and fixed categorical subject
# covariates to the Eurobarometer 55.2 data (see help("euro55.2.des"))
# on rankings of sources of information on scientific developments
model2cl <- pattnpml.fit(
y ~ SEX:AGE4 + (SEX + AGE4):(TV + RAD + NEWSP + SCIMAG + WWW + EDINST) - 1,
random = ~ TV + RAD + NEWSP + SCIMAG + WWW + EDINST,
k = 2, design = euro55.2.des, pr.it = TRUE)
summary(model2cl)
BIC(model2cl)
## End(Not run)