mix {WeMix} | R Documentation |
Survey Weighted Mixed-Effects Models
Description
Implements a survey weighted mixed-effects model using the provided formula.
Usage
mix(
formula,
data,
weights,
cWeights = FALSE,
center_group = NULL,
center_grand = NULL,
max_iteration = 10,
nQuad = 13L,
run = TRUE,
verbose = FALSE,
acc0 = 120,
keepAdapting = FALSE,
start = NULL,
fast = FALSE,
family = NULL
)
Arguments
formula |
a formula object in the style of |
data |
a data frame containing the raw data for the model. |
weights |
a character vector of names of weight variables found in the data frame starts with units (level 1) and increasing (larger groups). |
cWeights |
logical, set to |
center_group |
a list where the name of each element is the name of the aggregation level, and the element is a formula of
variable names to be group mean centered; for example to group mean center gender and age within the group student:
|
center_grand |
a formula of variable names to be grand mean centered, for example to center the variable education by overall mean of
education: |
max_iteration |
a optional integer, for non-linear models fit by adaptive quadrature which limits number of iterations allowed before quitting. Defaults to 10. This is used because if the likelihood surface is flat, models may run for a very long time without converging. |
nQuad |
an optional integer number of quadrature points to evaluate models solved by adaptive quadrature. Only non-linear models are evaluated with adaptive quadrature. See notes for additional guidelines. |
run |
logical; |
verbose |
logical, default |
acc0 |
deprecated; ignored. |
keepAdapting |
logical, set to |
start |
optional numeric vector representing the point at which the model should start optimization; takes the shape of c(coef, vars) from results (see help). |
fast |
logical; deprecated |
family |
the family; optionally used to specify generalized linear mixed models. Currently only |
Details
Linear models are solved using a modification of the analytic solution developed by Bates and Pinheiro (1998). Non-linear models are solved using adaptive quadrature following the methods in STATA's GLAMMM (Rabe-Hesketh & Skrondal, 2006) and Pineiro and Chao (2006). The posterior modes used in adaptive quadrature are determined following the method in lme4pureR (Walker & Bates, 2015). For additional details, see the vignettes Weighted Mixed Models: Adaptive Quadrature and Weighted Mixed Models: Analytical Solution which provide extensive examples as well as a description of the mathematical basis of the estimation procedure and comparisons to model specifications in other common software. Notes:
Standard errors of random effect variances are robust; see vignette for details.
To see the function that is maximized in the estimation of this model, see the section on "Model Fitting" in the Introduction to Mixed Effect Models With WeMix vignette.
When all weights above the individual level are 1, this is similar to a
lmer
and you should uselme4
because it is much faster.If starting coefficients are not provided they are estimated using
lme4
.For non-linear models, when the variance of a random effect is very low (<.1), WeMix doesn't estimate it, because very low variances create problems with numerical evaluation. In these cases, consider estimating without that random effect.
The model is estimated by maximum likelihood estimation.
Non-linear models may have up to 3 nested levels.
To choose the number of quadrature points for non-linear model evaluation, a balance is needed between accuracy and speed; estimation time increases quadratically with the number of points chosen. In addition, an odd number of points is traditionally used. We recommend starting at 13 and increasing or decreasing as needed.
Value
object of class WeMixResults
.
This is a list with elements:
lnlf |
function, the likelihood function. |
lnl |
numeric, the log-likelihood of the model. |
coef |
numeric vector, the estimated coefficients of the model. |
ranefs |
the group-level random effects. |
SE |
the cluste robust (CR-0) standard errors of the fixed effects. |
vars |
numeric vector, the random effect variances. |
theta |
the theta vector. |
call |
the original call used. |
levels |
integer, the number of levels in the model. |
ICC |
numeric, the intraclass correlation coefficient. |
CMODE |
the conditional mean of the random effects. |
invHessian |
inverse of the second derivative of the likelihood function. |
ICC |
the interclass correlation. |
is_adaptive |
logical, indicates if adaptive quadrature was used for estimation. |
sigma |
the sigma value. |
ngroups |
the number of observations in each group. |
varDF |
the variance data frame in the format of the variance data frame returned by lme4. |
varVC |
the variance-covariance matrix of the random effects. |
cov_mat |
the variance-covariance matrix of the fixed effects. |
var_theta |
the variance covariance matrix of the theta terms. |
wgtStats |
statistics regarding weights, by level. |
ranefMat |
list of matrixes; each list element is a matrix of random effects by level with IDs in the rows and random effects in the columns. |
Author(s)
Paul Bailey, Blue Webb, Claire Kelley, and Trang Nguyen
Examples
## Not run:
library(lme4)
data(sleepstudy)
ss1 <- sleepstudy
# Create weights
ss1$W1 <- ifelse(ss1$Subject %in% c(308, 309, 310), 2, 1)
ss1$W2 <- 1
# Run random-intercept 2-level model
two_level <- mix(Reaction ~ Days + (1|Subject), data=ss1, weights=c("W1", "W2"))
#Run random-intercept 2-level model with group-mean centering
grp_centered <- mix(Reaction ~ Days + (1|Subject), data=ss1,
weights = c("W1", "W2"),
center_group = list("Subject" = ~Days))
#Run three level model with random slope and intercept.
#add group variables for 3 level model
ss1$Group <- 3
ss1$Group <- ifelse(as.numeric(ss1$Subject) %% 10 < 7, 2, ss1$Group)
ss1$Group <- ifelse(as.numeric(ss1$Subject) %% 10 < 4, 1, ss1$Group)
# level-3 weights
ss1$W3 <- ifelse(ss1$Group == 2, 2, 1)
three_level <- mix(Reaction ~ Days + (1|Subject) + (1+Days|Group), data=ss1,
weights=c("W1", "W2", "W3"))
# Conditional Weights
# use vignette example
library(EdSurvey)
#read in data
downloadPISA("~/", year=2012)
cntl <- readPISA("~/PISA/2012", countries="USA")
data <- getData(cntl,c("schoolid","pv1math","st29q03","sc14q02","st04q01",
"escs","w_fschwt","w_fstuwt"),
omittedLevels=FALSE, addAttributes=FALSE)
# Remove NA and omitted Levels
om <- c("Invalid", "N/A", "Missing", "Miss", NA, "(Missing)")
for (i in 1:ncol(data)) {
data <- data[!data[,i] %in% om,]
}
#relevel factors for model
data$st29q03 <- relevel(data$st29q03, ref="Strongly agree")
data$sc14q02 <- relevel(data$sc14q02, ref="Not at all")
# run with unconditional weights
m1u <- mix(pv1math ~ st29q03 + sc14q02 +st04q01+escs+ (1|schoolid), data=data,
weights=c("w_fstuwt", "w_fschwt"))
summary(m1u)
# conditional weights
data$pwt2 <- data$w_fschwt
data$pwt1 <- data$w_fstuwt / data$w_fschwt
# run with conditional weights
m1c <- mix(pv1math ~ st29q03 + sc14q02 +st04q01+escs+ (1|schoolid), data=data,
weights=c("pwt1", "pwt2"), cWeights=TRUE)
summary(m1c)
# the results are, up to rounding, the same in m1u and m1c, only the calls are different
## End(Not run)