pickle {aster} R Documentation

## Penalized Quasi-Likelihood for Aster Models

### Description

Evaluates an approximation to minus the log likelihood for an aster model with random effects. Uses Laplace approximation to integrate out the random effects analytically. The “quasi” in the title is a misnomer in the context of aster models but the acronym PQL for this procedure is well-established in the generalized linear mixed models literature.

### Usage

```pickle(sigma, parm, fixed, random, obj, y, origin, cache, ...)
makezwz(sigma, parm, fixed, random, obj, y, origin)
pickle1(sigma, parm, fixed, random, obj, y, origin, cache, zwz,
deriv = 0, ...)
pickle2(alphasigma, parm, fixed, random, obj, y, origin, cache, zwz,
deriv = 0, ...)
pickle3(alphaceesigma, fixed, random, obj, y, origin, zwz, deriv = 0)
```

### Arguments

 `sigma` vector of square roots of variance components, one component for each group of random effects. Negative values are allowed; the vector of variance components is `sigma^2`. `parm` starting value for inner optimization. Ignored if `cache\$parm` exists, in which case the latter is used. For `pickle` and `pickle1`, length is number of effects (fixed and random). For `pickle2`, length is number of random effects. For all, random effects are rescaled, divided by the corresponding component of `sigma` if that is nonzero and equal to zero otherwise. `alphasigma` the concatenation of the vector of fixed effects and the vector of square roots of variance components. `alphaceesigma` the concatenation of the vector of fixed effects, the vector of rescaled random effects, and the vector of square roots of variance components. `fixed` the model matrix for fixed effects. The number of rows is `nrow(obj\$data)`. The number of columns is the number of fixed effects. `random` the model matrix or matrices for random effects. The number of rows is `nrow(obj\$data)`. The number of columns is the number of random effects in a group. Either a matrix or a list each element of which is a matrix. `obj` aster model object, the result of a call to `aster`. `y` response vector. May be omitted, in which case `obj\$x` is used. If supplied, must be a matrix of the same dimensions as `obj\$x`. `origin` origin of aster model. May be omitted, in which case default origin (see `aster`) is used. If supplied, must be a matrix of the same dimensions `obj\$x`. `cache` If not missing, an environment in which to cache the value of `parm` found during previous evaluations. If supplied `parm` is taken from `cache`. `zwz` A possible value of t(Z) W Z, where Z is the model matrix for all random effects and W is the variance matrix of the response. `deriv` Number of derivatives wanted. For `pickle1` or `pickle2`, either zero or one. For `pickle3`, zero, one or two. `...` additional arguments passed to `trust`, which is used to maximize the penalized log likelihood.

### Details

Define

p(alpha, c, sigma) = m(a + M alpha + Z A c) + t(c) c / 2 + log det[A t(Z) What Z A + I] / 2

where m is minus the log likelihood function of a saturated aster model, a is a known vector (the offset vector in the terminology of `glm` but the origin in the terminology of `aster`), M is a known matrix, the model matrix for fixed effects (the argument `fixed` of these functions), Z is a known matrix, the model matrix for random effects (either the argument `random` of these functions if it is a matrix or `Reduce(cbind, random)` if `random` is a list of matrices), A is a diagonal matrix whose diagonal is the vector `rep(sigma, times = nrand)` where `nrand` is `sapply(random, ncol)` when `random` is a list of matrices and `ncol(random)` when `random` is a matrix, What is any symmetric positive semidefinite matrix (more on this below), and I is the identity matrix. Note that A is a function of sigma although the notation does not explicitly indicate this.

Let cstar denote the minimizer of p(α, c, σ) considered as a function of c for fixed alpha and sigma, and let alphatwiddle and ctwiddle denote the (joint) minimizers of p(α, c, σ) considered as a function of alpha and c for fixed sigma. Note that cstar is a function of alpha and sigma although the notation does not explicitly indicate this. Note that alphatwiddle and ctwiddle are functions of sigma (only) although the notation does not explicitly indicate this. Now define

q(alpha, sigma) = p(alpha, cstar, sigma)

and

r(sigma) = p(alphatwiddle, cstar, sigma)

Then `pickle1` evaluates r(sigma), `pickle2` evaluates q(alpha, sigma), and `pickle3` evaluates p(alpha, c, sigma), where t(Z) What Z in the formulas above is specified by the argument `zwz` of these functions. All of these functions supply derivative (gradient) vectors if `deriv = 1` is specified, and `pickle3` supplies the second derivative (Hessian) matrix if `deriv = 2` is specified.

Let W denote the second derivative function of m, that is, W(phi) is the second derivative matrix of the function m evaluated at the point phi. The idea is that What should be approximately the value of W(a + M alphahat + Z Ahat chat), where alphahat, chat, and sigmahat are the (joint) minimizers of p(alpha, c, sigma) and Ahat = A(sigmahat). In aid of this, the function `makezwz` evaluates t(Z) W(a + M alpha + Z A c) Z for any alpha, c, and sigma.

`pickle` evaluates the function

s(sigma) = m(a + M alphatwiddle + Z A ctwiddle) + t(ctwiddle) ctwiddle / 2 + log det[A t(Z) W(a + M alphatwiddle + Z A ctwiddle) Z A + I]

no derivatives can be computed because no derivatives of the function W are computed for aster models.

The general idea is the one uses `pickle` with a no-derivative optimizer, such as the `"Nelder-Mead"` method of the `optim` function to get a crude estimate of sigmahat. Then one uses `trust` with objective function `penmlogl` to estimate the corresponding alphahat and chat (example below). Then one use `makezwz` to produce the corresponding `zwz` (example below). These estimates can be improved using `trust` with objective function `pickle3` using this `zwz` (example below), and this step may be iterated until convergence. Finally, `optim` is used with objective function `pickle2` to estimate the Hessian matrix of q(alpha, sigma), which is approximate observed information because q(alpha, sigma) is approximate minus log likelihood.

### Value

For `pickle`, a scalar, minus the (PQL approximation of) the log likelihood. For `pickle1` and `pickle2`, a list having components `value` and `gradient` (present only when `deriv = 1`). For `pickle3`, a list having components `value`, `gradient` (present only when `deriv >= 1`), and `hessian` (present only when `deriv = 2`).

### Note

Not intended for use by naive users. Use `reaster`, which calls them.

### Examples

```data(radish)
library(trust)

pred <- c(0,1,2)
fam <- c(1,3,2)

### need object of type aster to supply to penmlogl and pickle

aout <- aster(resp ~ varb + fit : (Site * Region + Block + Pop),
pred, fam, varb, id, root, data = radish)

### model matrices for fixed and random effects

modmat.fix <- model.matrix(resp ~ varb + fit : (Site * Region),
modmat.blk <- model.matrix(resp ~ 0 + fit:Block, data = radish)
modmat.pop <- model.matrix(resp ~ 0 + fit:Pop, data = radish)

rownames(modmat.fix) <- NULL
rownames(modmat.blk) <- NULL
rownames(modmat.pop) <- NULL

idrop <- match(aout\$dropped, colnames(modmat.fix))
idrop <- idrop[! is.na(idrop)]
modmat.fix <- modmat.fix[ , - idrop]

nfix <- ncol(modmat.fix)
nblk <- ncol(modmat.blk)
npop <- ncol(modmat.pop)

### try penmlogl

sigma.start <- c(1, 1)

alpha.start <- aout\$coefficients[match(colnames(modmat.fix),
names(aout\$coefficients))]
parm.start <- c(alpha.start, rep(0, nblk + npop))

tout <- trust(objfun = penmlogl, parm.start, rinit = 1, rmax = 10,
sigma = sigma.start, fixed = modmat.fix,
random = list(modmat.blk, modmat.pop), obj = aout)
tout\$converged

### crude estimate of variance components

eff.blk <- tout\$argument[seq(nfix + 1, nfix + nblk)]
eff.pop <- tout\$argument[seq(nfix + nblk + 1, nfix + nblk + npop)]
sigma.crude <- sqrt(c(var(eff.blk), var(eff.pop)))

### try optim and pickle

cache <- new.env(parent = emptyenv())
oout <- optim(sigma.crude, pickle, parm = tout\$argument,
fixed = modmat.fix, random = list(modmat.blk, modmat.pop),
obj = aout, cache = cache)
oout\$convergence == 0
### estimated variance components
oout\$par^2

### get estimates of fixed and random effects

tout <- trust(objfun = penmlogl, tout\$argument, rinit = 1, rmax = 10,
sigma = oout\$par, fixed = modmat.fix,
random = list(modmat.blk, modmat.pop), obj = aout, fterm = 0)
tout\$converged

sigma.better <- oout\$par
alpha.better <- tout\$argument[1:nfix]
c.better <- tout\$argument[- (1:nfix)]
zwz.better <- makezwz(sigma.better, parm = c(alpha.better, c.better),
fixed = modmat.fix, random = list(modmat.blk, modmat.pop), obj = aout)

### get better estimates

objfun <- function(alphaceesigma, zwz)
pickle3(alphaceesigma, fixed = modmat.fix,
random = list(modmat.blk, modmat.pop), obj = aout, zwz = zwz, deriv = 2)
tout <- trust(objfun, c(alpha.better, c.better, sigma.better),
rinit = 1, rmax = 10, zwz = zwz.better)
tout\$converged
alpha.mle <- tout\$argument[1:nfix]
c.mle <- tout\$argument[nfix + 1:(nblk + npop)]
sigma.mle <- tout\$argument[nfix + nblk + npop + 1:2]
zwz.mle <- makezwz(sigma.mle, parm = c(alpha.mle, c.mle),
fixed = modmat.fix, random = list(modmat.blk, modmat.pop), obj = aout)
### estimated variance components
sigma.mle^2

### preceding step can be iterated "until convergence"

### get (approximate) Fisher information

objfun <- function(alphasigma) pickle2(alphasigma, parm = c.mle,
fixed = modmat.fix, random = list(modmat.blk, modmat.pop),
obj = aout, zwz = zwz.mle)\$value
gradfun <- function(alphasigma) pickle2(alphasigma, parm = c.mle,
fixed = modmat.fix, random = list(modmat.blk, modmat.pop),
obj = aout, zwz = zwz.mle, deriv = 1)\$gradient
oout <- optim(c(alpha.mle, sigma.mle), objfun, gradfun, method = "BFGS",
hessian = TRUE)
oout\$convergence == 0
fish <- oout\$hessian
```

[Package aster version 1.1-2 Index]