pickle {aster} | R Documentation |
Penalized Quasi-Likelihood for Aster Models
Description
Evaluates an approximation to minus the log likelihood for an aster model with random effects. Uses Laplace approximation to integrate out the random effects analytically. The “quasi” in the title is a misnomer in the context of aster models but the acronym PQL for this procedure is well-established in the generalized linear mixed models literature.
Usage
pickle(sigma, parm, fixed, random, obj, y, origin, cache, ...)
makezwz(sigma, parm, fixed, random, obj, y, origin)
pickle1(sigma, parm, fixed, random, obj, y, origin, cache, zwz,
deriv = 0, ...)
pickle2(alphasigma, parm, fixed, random, obj, y, origin, cache, zwz,
deriv = 0, ...)
pickle3(alphaceesigma, fixed, random, obj, y, origin, zwz, deriv = 0)
Arguments
sigma |
vector of square roots of variance components,
one component for each group of random effects. Negative values are
allowed; the vector of variance components is |
parm |
starting value for inner optimization. Ignored if
|
alphasigma |
the concatenation of the vector of fixed effects and the vector of square roots of variance components. |
alphaceesigma |
the concatenation of the vector of fixed effects, the vector of rescaled random effects, and the vector of square roots of variance components. |
fixed |
the model matrix for fixed effects. The number of rows
is |
random |
the model matrix or matrices for random effects.
The number of rows is |
obj |
aster model object, the result of a call to |
y |
response vector. May be omitted, in which case |
origin |
origin of aster model. May be omitted, in which case
default origin (see |
cache |
If not missing, an environment in which to cache the value
of |
zwz |
A possible value of |
deriv |
Number of derivatives wanted. For |
... |
additional arguments passed to |
Details
Define
p(\alpha, c, \sigma) = m(a + M \alpha + Z A c) + c^T c / 2 + \log \det[A Z^T \widehat{W} Z A + I] / 2
where m
is minus the log likelihood function of a saturated aster model,
a
is a known vector (the offset vector in the terminology
of glm
but the origin in the terminology
of aster
), M
is a known matrix, the model matrix for
fixed effects (the argument fixed
of these functions),
Z
is a known matrix, the model matrix for random effects
(either the argument random
of these functions if it is a matrix or
Reduce(cbind, random)
if random
is a list of matrices),
A
is a diagonal matrix whose diagonal is the vector
rep(sigma, times = nrand)
where nrand
is sapply(random, ncol)
when random
is a list of
matrices and ncol(random)
when random
is a matrix,
\widehat{W}
is any symmetric positive semidefinite matrix
(more on this below),
and I
is the identity matrix.
Note that A
is a function of \sigma
although the
notation does not explicitly indicate this.
Let c^*
denote the minimizer of
p(\alpha, c, \sigma)
considered as a function of c
for fixed \alpha
and \sigma
,
and let \tilde{\alpha}
and \tilde{c}
denote the (joint) minimizers of p(\alpha, c, \sigma)
considered as
a function of \alpha
and c
for fixed \sigma
.
Note that c^*
is a function of \alpha
and \sigma
although the notation does not explicitly
indicate this.
Note that \tilde{\alpha}
and \tilde{c}
are functions of \sigma
(only) although the notation
does not explicitly indicate this.
Now define
q(\alpha, \sigma) = p(\alpha, c^*, \sigma)
and
r(\sigma) = p(\tilde{\alpha}, \tilde{c}, \sigma)
Then pickle1
evaluates r(\sigma)
,
pickle2
evaluates q(\alpha, \sigma)
, and
pickle3
evaluates p(\alpha, c, \sigma)
,
where Z^T \widehat{W} Z
in the formulas above is
specified by the argument zwz
of these functions.
All of these functions supply derivative (gradient) vectors if
deriv = 1
is specified, and pickle3
supplies the
second derivative (Hessian) matrix if deriv = 2
is specified.
Let W
denote the second derivative function of m
, that is,
W(\varphi)
is the second derivative matrix of the function
m
evaluated at the point \varphi
. The idea is that
\widehat{W}
should be approximately the value of
W(a + M \hat{\alpha} + Z \widehat{A} \hat{c})
,
where \hat{\alpha}
, \hat{c}
,
and \hat{\sigma}
are the (joint) minimizers of p(\alpha, c, \sigma)
and \widehat{A} = A(\hat{\sigma})
.
In aid of this, the function makezwz
evaluates
Z^T W(a + M \alpha + Z A c) Z
for any \alpha
, c
, and \sigma
.
pickle
evaluates the function
s(\sigma) = m(a + M \tilde{\alpha} + Z A \tilde{c}) + \tilde{c}^T \tilde{c} / 2 + \log \det[A Z^T W(a + M \tilde{\alpha} + Z A \tilde{c}) Z A + I]
no derivatives can be computed because no derivatives of the function
W
are computed for aster models.
The general idea is the one uses pickle
with a no-derivative
optimizer, such as the "Nelder-Mead"
method of the optim
function to get a crude estimate of \hat{\sigma}
.
Then one uses trust
with objective
function penmlogl
to estimate the corresponding
\hat{\alpha}
and \hat{c}
(example below).
Then one use makezwz
to produce the corresponding zwz
(example below).
These estimates can be improved using trust
with objective
function pickle3
using this zwz
(example below), and this
step may be iterated until convergence.
Finally, optim
is used with objective function pickle2
to estimate the Hessian matrix of q(\alpha, \sigma)
,
which is approximate observed information because
q(\alpha, \sigma)
is approximate minus log likelihood.
Value
For pickle
, a scalar, minus the
(PQL approximation of) the log likelihood.
For pickle1
and pickle2
, a list having components value
and gradient
(present only when deriv = 1
).
For pickle3
, a list having components value
,
gradient
(present only when deriv >= 1
),
and hessian
(present only when deriv = 2
).
Note
Not intended for use by naive users. Use reaster
,
which calls them.
Examples
data(radish)
library(trust)
pred <- c(0,1,2)
fam <- c(1,3,2)
### need object of type aster to supply to penmlogl and pickle
aout <- aster(resp ~ varb + fit : (Site * Region + Block + Pop),
pred, fam, varb, id, root, data = radish)
### model matrices for fixed and random effects
modmat.fix <- model.matrix(resp ~ varb + fit : (Site * Region),
data = radish)
modmat.blk <- model.matrix(resp ~ 0 + fit:Block, data = radish)
modmat.pop <- model.matrix(resp ~ 0 + fit:Pop, data = radish)
rownames(modmat.fix) <- NULL
rownames(modmat.blk) <- NULL
rownames(modmat.pop) <- NULL
idrop <- match(aout$dropped, colnames(modmat.fix))
idrop <- idrop[! is.na(idrop)]
modmat.fix <- modmat.fix[ , - idrop]
nfix <- ncol(modmat.fix)
nblk <- ncol(modmat.blk)
npop <- ncol(modmat.pop)
### try penmlogl
sigma.start <- c(1, 1)
alpha.start <- aout$coefficients[match(colnames(modmat.fix),
names(aout$coefficients))]
parm.start <- c(alpha.start, rep(0, nblk + npop))
tout <- trust(objfun = penmlogl, parm.start, rinit = 1, rmax = 10,
sigma = sigma.start, fixed = modmat.fix,
random = list(modmat.blk, modmat.pop), obj = aout)
tout$converged
### crude estimate of variance components
eff.blk <- tout$argument[seq(nfix + 1, nfix + nblk)]
eff.pop <- tout$argument[seq(nfix + nblk + 1, nfix + nblk + npop)]
sigma.crude <- sqrt(c(var(eff.blk), var(eff.pop)))
### try optim and pickle
cache <- new.env(parent = emptyenv())
oout <- optim(sigma.crude, pickle, parm = tout$argument,
fixed = modmat.fix, random = list(modmat.blk, modmat.pop),
obj = aout, cache = cache)
oout$convergence == 0
### estimated variance components
oout$par^2
### get estimates of fixed and random effects
tout <- trust(objfun = penmlogl, tout$argument, rinit = 1, rmax = 10,
sigma = oout$par, fixed = modmat.fix,
random = list(modmat.blk, modmat.pop), obj = aout, fterm = 0)
tout$converged
sigma.better <- oout$par
alpha.better <- tout$argument[1:nfix]
c.better <- tout$argument[- (1:nfix)]
zwz.better <- makezwz(sigma.better, parm = c(alpha.better, c.better),
fixed = modmat.fix, random = list(modmat.blk, modmat.pop), obj = aout)
### get better estimates
objfun <- function(alphaceesigma, zwz)
pickle3(alphaceesigma, fixed = modmat.fix,
random = list(modmat.blk, modmat.pop), obj = aout, zwz = zwz, deriv = 2)
tout <- trust(objfun, c(alpha.better, c.better, sigma.better),
rinit = 1, rmax = 10, zwz = zwz.better)
tout$converged
alpha.mle <- tout$argument[1:nfix]
c.mle <- tout$argument[nfix + 1:(nblk + npop)]
sigma.mle <- tout$argument[nfix + nblk + npop + 1:2]
zwz.mle <- makezwz(sigma.mle, parm = c(alpha.mle, c.mle),
fixed = modmat.fix, random = list(modmat.blk, modmat.pop), obj = aout)
### estimated variance components
sigma.mle^2
### preceding step can be iterated "until convergence"
### get (approximate) Fisher information
objfun <- function(alphasigma) pickle2(alphasigma, parm = c.mle,
fixed = modmat.fix, random = list(modmat.blk, modmat.pop),
obj = aout, zwz = zwz.mle)$value
gradfun <- function(alphasigma) pickle2(alphasigma, parm = c.mle,
fixed = modmat.fix, random = list(modmat.blk, modmat.pop),
obj = aout, zwz = zwz.mle, deriv = 1)$gradient
oout <- optim(c(alpha.mle, sigma.mle), objfun, gradfun, method = "BFGS",
hessian = TRUE)
oout$convergence == 0
fish <- oout$hessian