R: Constructs a start fit for use in the 'rpql' function

build.start.fit {rpql}

R Documentation

Constructs a start fit for use in the `rpql` function

Description

Takes a GLMM fitted using the lme4 package i.e., using either the lmer or glmer functions, and construct a list containing starting values for use in the start argument in main fitting function rpql. It also constructs adaptive lasso weights, which can subsequently be used in the pen.weights arguments in the rpql function, if the adaptive lasso penalty is used for variable selection.

Usage

build.start.fit(lme4.fit, id = NULL, gamma = 0, cov.groups = NULL)

Arguments

`lme4.fit`	An object of class "lmerMod" or "glmerMod", obtained when fitting a (G)LMM using the `lmer` and `glmer` functions in the `lme4` package.
`id`	A optional list with each element being a vector of IDs that reference the model matrix in the corresponding element in the list `Z`. Each vector of IDs must be integers (but not factors). Note this is optional argument as it is only use for non-compulsory formatting purposes in the function.
`gamma`	A vector of power parameters, `\gamma`, for use in constructing adaptive lasso weights. Can be a vector of one or two elements. If two elements, then the first and second elements are the power parameter for the fixed and random effect weights respectively. If one element, the same power parameter is used for both fixed and random effect weights. Defaults to 0, in which case the weights are all equal to 1 i.e., it reverts to the unweighted lasso penalty.
`cov.groups`	A vector specifying if fixed effect coefficients (including the intercept) should be regarded and therefore penalized in groups. For example, if one or more of the fixed effect covariates are factors, then `lme4` will automatically create dummy variables in the model matrix and estimate coefficients for each level, using one level as the reference. `cov.groups` is then used to identify all the coefficients that corresponds to that factor, such that all of these coefficients are penalized collectively as a group. Defaults to NULL, in which case it is assumed all coefficients should be treated independently. Please see the details and examples for more details.

Details

This function is mainly used when: 1) you want to produce good starting values for the main fitting function rpql, and so you fit a saturated (full) GLMM using lme4 and use the estimates from there as starting values, and/or 2) you want to obtain adaptive lasso weights of the form weight_k = |\tilde{parameter}_k|^{-\gamma}, where \gamma > 0 is the power parameter and \tilde{parameter}_k is the parameter estimate from the saturated model fit. For regularized PQL specifically, this function will construct adaptive lasso weights from the lme4 fit as follows: Let w^F and w^R denote fixed and random effect adaptive weights respectively. Then we have,

w^F_k = |\tilde{\beta}_k|^{-\gamma_1}

w^R_l = |\tilde{\Sigma}_{ll}|^{-\gamma_2},

where \tilde{\beta}_k is the estimated coefficient for the k^{th} fixed effect, \tilde{\Sigma}_{ll} is the l^{th} diagonal element from the estimated random effects covariance matrix, and \gamma is a vector of two power parameters; see Zou (2006) for the adaptive lasso, and Hui et al. (2016) for regularized PQL selection in GLMMs using on adaptive lasso type penalties.

If cov.groups is supplied, this means that some of the fixed effects coefficients should be treated and penalized collectively as a group. The most common cases where this is used is when you have factor or categorical variables with more than two levels, or when you have polynomial terms that should be dealt with together. For instance, suppose you have a model matrix consisting of six columns, where first three columns correspond to separate covariates (including the intercept) and the last three columns all correspond to dummy variables created for a factor variable with four levels , e.g. soil moisture with levels dry, moderately moist, very moist, wet. The coefficients from the last three columns should then be penalized together, and so we can set cov.groups = c(1,2,3,4,4,4).

In doing so, the adaptive lasso weights for the grouped coefficients are then constructed differently. Following on from the example above, we have the fixed effect weight for soil moisture defined as

w^F = \|\tilde{\beta}\|^{-\gamma_1},

where \| \cdot \| corresponds to the L2-norm and \tilde{\beta} are the fixed effect coefficients belonging in the group (three in this case). When entered into the rpql function, an adaptive group lasso (Wang and Leng, 2008) is applied to these set of coefficients, such that they are all encouraged to be shrunk to zero at the same time.

Of course, after construction the adaptive lasso weights can be manually altered before entering into the main rpql function e.g., if one wants certain fixed and/or random effects to not be penalized.

Value

A list containing the following elements

`fixef`	Fixed effect coefficient estimates from `lme4.fit`.
`ranef`	A list of random effect predicted coefficients from `lme4.fit`.
`ran.cov`	A list of random effects covariance matrices from `lme4.fit`.
`cov.groups`	The argument `cov.groups`. Defaults to `NULL`.
`pen.weights`	A list of adaptive lasso weights constructed from `lme4.fit`. Contains elements `pen.weight$fixed` and `pen.weights$random`, which are the weights for the fixed and random effects respectively. Please see details above as to their construction.

Warnings

In order to construct sensible starting values and weights, this function should really only be used when lme4.fit is a fit of the saturated GLMM, i.e. all fixed and random effects included.

Author(s)

Francis K.C. Hui <francis.hui@gmail.com>, with contributions from Samuel Mueller <samuel.mueller@sydney.edu.au> and A.H. Welsh <Alan.Welsh@anu.edu.au>

Maintainer: Francis Hui <fhui28@gmail.com>

References

Hui, F.K.C., Mueller, S., and Welsh, A.H. (2016). Joint Selection in Mixed Models using Regularized PQL. Journal of the American Statistical Association: accepted for publication.
Wang, H., and Leng, C. (2008). A note on adaptive group lasso. Computational Statistics & Data Analysis, 52, 5277-5286.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101, 1418-1429.

Examples


##-------------------------
## Example 1: Bernoulli GLMM with grouped covariates. 
## Independent cluster model with 50 clusters and equal cluster sizes of 10
## Nine covariates where the last covariate (soil type) is a factor with four levels
##-------------------------
n <- 50; p <- 8; m <- 10
set.seed(123)
X <- data.frame(matrix(rnorm(n*m*p),n*m,p), soil=sample(1:4,size=m*n,replace=TRUE))
X$soil <- factor(X$soil)
X <- model.matrix(~ ., data = X)
colnames(X) <- paste("X",1:ncol(X),sep="")

Z <- X[,1:5] ## Random effects model matrix taken as first five columns
true_betas <- c(-0.1,1,-1,1,-1,1,-1,0,0,0,0,0) 
true_D <- matrix(0,ncol(Z),ncol(Z))
true_D[1:3,1:3] <- matrix(c(9,4.8,0.6,4.8,4,1,0.6,1,1),
	3,3,byrow=TRUE) ## 3 important random effects 

simy <- gendat.glmm(id = list(cluster = rep(1:n,each=m)), X = X, beta = true_betas, 
	Z = list(cluster = Z), D = list(cluster = true_D), family = binomial())

  
## Not run: 
library(lme4)
dat <- data.frame(y = simy$y, simy$X, simy$Z$cluster, simy$id)
fit_satlme4 <- glmer(y ~ X - 1 + (Z - 1 | cluster), data = dat, 
	family = "binomial")
fit_sat <- build.start.fit(fit_satlme4, id = simy$id, gamma = 2, 
	cov.groups = c(1:9,10,10,10)) 

new.fit <- rpql(y = simy$y, X = simy$X, Z = simy$Z, id = simy$id, lambda = 0.01, 
	pen.type = "adl", pen.weights = fit_sat$pen.weights,
	cov.groups = fit_sat$cov.groups, start = fit_sat, family = binomial())  
	
## End(Not run)

[Package rpql version 0.8.1 Index]

Constructs a start fit for use in the rpql function