R: Penalized Quasi-Likelihood for Aster Models

pickle {aster}

R Documentation

Penalized Quasi-Likelihood for Aster Models

Description

Evaluates an approximation to minus the log likelihood for an aster model with random effects. Uses Laplace approximation to integrate out the random effects analytically. The “quasi” in the title is a misnomer in the context of aster models but the acronym PQL for this procedure is well-established in the generalized linear mixed models literature.

Usage

pickle(sigma, parm, fixed, random, obj, y, origin, cache, ...)
makezwz(sigma, parm, fixed, random, obj, y, origin)
pickle1(sigma, parm, fixed, random, obj, y, origin, cache, zwz,
    deriv = 0, ...)
pickle2(alphasigma, parm, fixed, random, obj, y, origin, cache, zwz,
    deriv = 0, ...)
pickle3(alphaceesigma, fixed, random, obj, y, origin, zwz, deriv = 0)

Arguments

`sigma`	vector of square roots of variance components, one component for each group of random effects. Negative values are allowed; the vector of variance components is `sigma^2`.
`parm`	starting value for inner optimization. Ignored if `cache$parm` exists, in which case the latter is used. For `pickle` and `pickle1`, length is number of effects (fixed and random). For `pickle2`, length is number of random effects. For all, random effects are rescaled, divided by the corresponding component of `sigma` if that is nonzero and equal to zero otherwise.
`alphasigma`	the concatenation of the vector of fixed effects and the vector of square roots of variance components.
`alphaceesigma`	the concatenation of the vector of fixed effects, the vector of rescaled random effects, and the vector of square roots of variance components.
`fixed`	the model matrix for fixed effects. The number of rows is `nrow(obj$data)`. The number of columns is the number of fixed effects.
`random`	the model matrix or matrices for random effects. The number of rows is `nrow(obj$data)`. The number of columns is the number of random effects in a group. Either a matrix or a list each element of which is a matrix.
`obj`	aster model object, the result of a call to `aster`.
`y`	response vector. May be omitted, in which case `obj$x` is used. If supplied, must be a matrix of the same dimensions as `obj$x`.
`origin`	origin of aster model. May be omitted, in which case default origin (see `aster`) is used. If supplied, must be a matrix of the same dimensions `obj$x`.
`cache`	If not missing, an environment in which to cache the value of `parm` found during previous evaluations. If supplied `parm` is taken from `cache`.
`zwz`	A possible value of `Z^T W Z`, where `Z` is the model matrix for all random effects and `W` is the variance matrix of the response.
`deriv`	Number of derivatives wanted. For `pickle1` or `pickle2`, either zero or one. For `pickle3`, zero, one or two.
`...`	additional arguments passed to `trust`, which is used to maximize the penalized log likelihood.

Details

Define

p(\alpha, c, \sigma) = m(a + M \alpha + Z A c) + c^T c / 2 + \log \det[A Z^T \widehat{W} Z A + I] / 2

where m is minus the log likelihood function of a saturated aster model, a is a known vector (the offset vector in the terminology of glm but the origin in the terminology of aster), M is a known matrix, the model matrix for fixed effects (the argument fixed of these functions), Z is a known matrix, the model matrix for random effects (either the argument random of these functions if it is a matrix or Reduce(cbind, random) if random is a list of matrices), A is a diagonal matrix whose diagonal is the vector rep(sigma, times = nrand) where nrand is sapply(random, ncol) when random is a list of matrices and ncol(random) when random is a matrix, \widehat{W} is any symmetric positive semidefinite matrix (more on this below), and I is the identity matrix. Note that A is a function of \sigma although the notation does not explicitly indicate this.

Let c^* denote the minimizer of p(\alpha, c, \sigma) considered as a function of c for fixed \alpha and \sigma, and let \tilde{\alpha} and \tilde{c} denote the (joint) minimizers of p(\alpha, c, \sigma) considered as a function of \alpha and c for fixed \sigma. Note that c^* is a function of \alpha and \sigma although the notation does not explicitly indicate this. Note that \tilde{\alpha} and \tilde{c} are functions of \sigma (only) although the notation does not explicitly indicate this. Now define

q(\alpha, \sigma) = p(\alpha, c^*, \sigma)

and

r(\sigma) = p(\tilde{\alpha}, \tilde{c}, \sigma)

Then pickle1 evaluates r(\sigma), pickle2 evaluates q(\alpha, \sigma), and pickle3 evaluates p(\alpha, c, \sigma), where Z^T \widehat{W} Z in the formulas above is specified by the argument zwz of these functions. All of these functions supply derivative (gradient) vectors if deriv = 1 is specified, and pickle3 supplies the second derivative (Hessian) matrix if deriv = 2 is specified.

Let W denote the second derivative function of m, that is, W(\varphi) is the second derivative matrix of the function m evaluated at the point \varphi. The idea is that \widehat{W} should be approximately the value of W(a + M \hat{\alpha} + Z \widehat{A} \hat{c}), where \hat{\alpha}, \hat{c}, and \hat{\sigma} are the (joint) minimizers of p(\alpha, c, \sigma) and \widehat{A} = A(\hat{\sigma}). In aid of this, the function makezwz evaluates Z^T W(a + M \alpha + Z A c) Z for any \alpha, c, and \sigma.

pickle evaluates the function

s(\sigma) = m(a + M \tilde{\alpha} + Z A \tilde{c}) + \tilde{c}^T \tilde{c} / 2 + \log \det[A Z^T W(a + M \tilde{\alpha} + Z A \tilde{c}) Z A + I]

no derivatives can be computed because no derivatives of the function W are computed for aster models.

The general idea is the one uses pickle with a no-derivative optimizer, such as the "Nelder-Mead" method of the optim function to get a crude estimate of \hat{\sigma}. Then one uses trust with objective function penmlogl to estimate the corresponding \hat{\alpha} and \hat{c} (example below). Then one use makezwz to produce the corresponding zwz (example below). These estimates can be improved using trust with objective function pickle3 using this zwz (example below), and this step may be iterated until convergence. Finally, optim is used with objective function pickle2 to estimate the Hessian matrix of q(\alpha, \sigma), which is approximate observed information because q(\alpha, \sigma) is approximate minus log likelihood.

Value

For pickle, a scalar, minus the (PQL approximation of) the log likelihood. For pickle1 and pickle2, a list having components value and gradient (present only when deriv = 1). For pickle3, a list having components value, gradient (present only when deriv >= 1), and hessian (present only when deriv = 2).

Note

Not intended for use by naive users. Use reaster, which calls them.

Examples

data(radish)
library(trust)

pred <- c(0,1,2)
fam <- c(1,3,2)

### need object of type aster to supply to penmlogl and pickle

aout <- aster(resp ~ varb + fit : (Site * Region + Block + Pop),
    pred, fam, varb, id, root, data = radish)

### model matrices for fixed and random effects

modmat.fix <- model.matrix(resp ~ varb + fit : (Site * Region),
    data = radish)
modmat.blk <- model.matrix(resp ~ 0 + fit:Block, data = radish)
modmat.pop <- model.matrix(resp ~ 0 + fit:Pop, data = radish)

rownames(modmat.fix) <- NULL
rownames(modmat.blk) <- NULL
rownames(modmat.pop) <- NULL

idrop <- match(aout$dropped, colnames(modmat.fix))
idrop <- idrop[! is.na(idrop)]
modmat.fix <- modmat.fix[ , - idrop]

nfix <- ncol(modmat.fix)
nblk <- ncol(modmat.blk)
npop <- ncol(modmat.pop)

### try penmlogl

sigma.start <- c(1, 1)

alpha.start <- aout$coefficients[match(colnames(modmat.fix),
    names(aout$coefficients))]
parm.start <- c(alpha.start, rep(0, nblk + npop))

tout <- trust(objfun = penmlogl, parm.start, rinit = 1, rmax = 10,
    sigma = sigma.start, fixed = modmat.fix,
    random = list(modmat.blk, modmat.pop), obj = aout)
tout$converged

### crude estimate of variance components

eff.blk <- tout$argument[seq(nfix + 1, nfix + nblk)]
eff.pop <- tout$argument[seq(nfix + nblk + 1, nfix + nblk + npop)]
sigma.crude <- sqrt(c(var(eff.blk), var(eff.pop)))

### try optim and pickle

cache <- new.env(parent = emptyenv())
oout <- optim(sigma.crude, pickle, parm = tout$argument,
    fixed = modmat.fix, random = list(modmat.blk, modmat.pop),
    obj = aout, cache = cache)
oout$convergence == 0
### estimated variance components
oout$par^2

### get estimates of fixed and random effects

tout <- trust(objfun = penmlogl, tout$argument, rinit = 1, rmax = 10,
    sigma = oout$par, fixed = modmat.fix,
    random = list(modmat.blk, modmat.pop), obj = aout, fterm = 0)
tout$converged

sigma.better <- oout$par
alpha.better <- tout$argument[1:nfix]
c.better <- tout$argument[- (1:nfix)]
zwz.better <- makezwz(sigma.better, parm = c(alpha.better, c.better),
    fixed = modmat.fix, random = list(modmat.blk, modmat.pop), obj = aout)

### get better estimates

objfun <- function(alphaceesigma, zwz)
    pickle3(alphaceesigma, fixed = modmat.fix,
    random = list(modmat.blk, modmat.pop), obj = aout, zwz = zwz, deriv = 2)
tout <- trust(objfun, c(alpha.better, c.better, sigma.better),
    rinit = 1, rmax = 10, zwz = zwz.better)
tout$converged
alpha.mle <- tout$argument[1:nfix]
c.mle <- tout$argument[nfix + 1:(nblk + npop)]
sigma.mle <- tout$argument[nfix + nblk + npop + 1:2]
zwz.mle <- makezwz(sigma.mle, parm = c(alpha.mle, c.mle),
    fixed = modmat.fix, random = list(modmat.blk, modmat.pop), obj = aout)
### estimated variance components
sigma.mle^2

### preceding step can be iterated "until convergence"

### get (approximate) Fisher information

objfun <- function(alphasigma) pickle2(alphasigma, parm = c.mle,
    fixed = modmat.fix, random = list(modmat.blk, modmat.pop),
    obj = aout, zwz = zwz.mle)$value
gradfun <- function(alphasigma) pickle2(alphasigma, parm = c.mle,
    fixed = modmat.fix, random = list(modmat.blk, modmat.pop),
    obj = aout, zwz = zwz.mle, deriv = 1)$gradient
oout <- optim(c(alpha.mle, sigma.mle), objfun, gradfun, method = "BFGS",
    hessian = TRUE)
oout$convergence == 0
fish <- oout$hessian

[Package aster version 1.1-3 Index]