build.start.fit {rpql} | R Documentation |
Constructs a start fit for use in the rpql
function
Description
Takes a GLMM fitted using the lme4
package i.e., using either the lmer
or glmer
functions, and construct a list containing starting values for use in the start
argument in main fitting function rpql
. It also constructs adaptive lasso weights, which can subsequently be used in the pen.weights
arguments in the rpql
function, if the adaptive lasso penalty is used for variable selection.
Usage
build.start.fit(lme4.fit, id = NULL, gamma = 0, cov.groups = NULL)
Arguments
lme4.fit |
An object of class "lmerMod" or "glmerMod", obtained when fitting a (G)LMM using the |
id |
A optional list with each element being a vector of IDs that reference the model matrix in the corresponding element in the list |
gamma |
A vector of power parameters, |
cov.groups |
A vector specifying if fixed effect coefficients (including the intercept) should be regarded and therefore penalized in groups. For example, if one or more of the fixed effect covariates are factors, then |
Details
This function is mainly used when: 1) you want to produce good starting values for the main fitting function rpql
, and so you fit a saturated (full) GLMM using lme4
and use the estimates from there as starting values, and/or 2) you want to obtain adaptive lasso weights of the form weight_k = |\tilde{parameter}_k|^{-\gamma}
, where \gamma > 0
is the power parameter and \tilde{parameter}_k
is the parameter estimate from the saturated model fit. For regularized PQL specifically, this function will construct adaptive lasso weights from the lme4
fit as follows: Let w^F
and w^R
denote fixed and random effect adaptive weights respectively. Then we have,
w^F_k = |\tilde{\beta}_k|^{-\gamma_1}
w^R_l = |\tilde{\Sigma}_{ll}|^{-\gamma_2},
where \tilde{\beta}_k
is the estimated coefficient for the k^{th}
fixed effect, \tilde{\Sigma}_{ll}
is the l^{th}
diagonal element from the estimated random effects covariance matrix, and \gamma
is a vector of two power parameters; see Zou (2006) for the adaptive lasso, and Hui et al. (2016) for regularized PQL selection in GLMMs using on adaptive lasso type penalties.
If cov.groups
is supplied, this means that some of the fixed effects coefficients should be treated and penalized collectively as a group. The most common cases where this is used is when you have factor or categorical variables with more than two levels, or when you have polynomial terms that should be dealt with together. For instance, suppose you have a model matrix consisting of six columns, where first three columns correspond to separate covariates (including the intercept) and the last three columns all correspond to dummy variables created for a factor variable with four levels , e.g. soil moisture with levels dry, moderately moist, very moist, wet. The coefficients from the last three columns should then be penalized together, and so we can set cov.groups = c(1,2,3,4,4,4)
.
In doing so, the adaptive lasso weights for the grouped coefficients are then constructed differently. Following on from the example above, we have the fixed effect weight for soil moisture defined as
w^F = \|\tilde{\beta}\|^{-\gamma_1},
where \| \cdot \|
corresponds to the L2-norm and \tilde{\beta}
are the fixed effect coefficients belonging in the group (three in this case). When entered into the rpql
function, an adaptive group lasso (Wang and Leng, 2008) is applied to these set of coefficients, such that they are all encouraged to be shrunk to zero at the same time.
Of course, after construction the adaptive lasso weights can be manually altered before entering into the main rpql
function e.g., if one wants certain fixed and/or random effects to not be penalized.
Value
A list containing the following elements
fixef |
Fixed effect coefficient estimates from |
ranef |
A list of random effect predicted coefficients from |
ran.cov |
A list of random effects covariance matrices from |
cov.groups |
The argument |
pen.weights |
A list of adaptive lasso weights constructed from |
Warnings
In order to construct sensible starting values and weights, this function should really only be used when
lme4.fit
is a fit of the saturated GLMM, i.e. all fixed and random effects included.
Author(s)
Francis K.C. Hui <francis.hui@gmail.com>, with contributions from Samuel Mueller <samuel.mueller@sydney.edu.au> and A.H. Welsh <Alan.Welsh@anu.edu.au>
Maintainer: Francis Hui <fhui28@gmail.com>
References
Hui, F.K.C., Mueller, S., and Welsh, A.H. (2016). Joint Selection in Mixed Models using Regularized PQL. Journal of the American Statistical Association: accepted for publication.
Wang, H., and Leng, C. (2008). A note on adaptive group lasso. Computational Statistics & Data Analysis, 52, 5277-5286.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101, 1418-1429.
See Also
rpql
for fitting and performing model selection in GLMMs using regularized PQL, which may use the values obtained from build.start.fit
for starting values and adaptive lasso weights.
Examples
##-------------------------
## Example 1: Bernoulli GLMM with grouped covariates.
## Independent cluster model with 50 clusters and equal cluster sizes of 10
## Nine covariates where the last covariate (soil type) is a factor with four levels
##-------------------------
n <- 50; p <- 8; m <- 10
set.seed(123)
X <- data.frame(matrix(rnorm(n*m*p),n*m,p), soil=sample(1:4,size=m*n,replace=TRUE))
X$soil <- factor(X$soil)
X <- model.matrix(~ ., data = X)
colnames(X) <- paste("X",1:ncol(X),sep="")
Z <- X[,1:5] ## Random effects model matrix taken as first five columns
true_betas <- c(-0.1,1,-1,1,-1,1,-1,0,0,0,0,0)
true_D <- matrix(0,ncol(Z),ncol(Z))
true_D[1:3,1:3] <- matrix(c(9,4.8,0.6,4.8,4,1,0.6,1,1),
3,3,byrow=TRUE) ## 3 important random effects
simy <- gendat.glmm(id = list(cluster = rep(1:n,each=m)), X = X, beta = true_betas,
Z = list(cluster = Z), D = list(cluster = true_D), family = binomial())
## Not run:
library(lme4)
dat <- data.frame(y = simy$y, simy$X, simy$Z$cluster, simy$id)
fit_satlme4 <- glmer(y ~ X - 1 + (Z - 1 | cluster), data = dat,
family = "binomial")
fit_sat <- build.start.fit(fit_satlme4, id = simy$id, gamma = 2,
cov.groups = c(1:9,10,10,10))
new.fit <- rpql(y = simy$y, X = simy$X, Z = simy$Z, id = simy$id, lambda = 0.01,
pen.type = "adl", pen.weights = fit_sat$pen.weights,
cov.groups = fit_sat$cov.groups, start = fit_sat, family = binomial())
## End(Not run)