R: Aster Models

aster {aster}

R Documentation

Aster Models

Description

Fits Aster Models.

Usage

aster(x, ...)

## Default S3 method:
aster(x, root, pred, fam, modmat, parm,
    type = c("unconditional", "conditional"), famlist = fam.default(),
    origin, origin.type = c("model.type", "unconditional", "conditional"),
    method = c("trust", "nlm", "CG", "L-BFGS-B"), fscale, maxiter = 1000,
    nowarn = TRUE, newton = TRUE, optout = FALSE, coef.names, ...)

## S3 method for class 'formula'
aster(formula, pred, fam, varvar, idvar, root,
    data, parm, type = c("unconditional", "conditional"),
    famlist = fam.default(),
    origin, origin.type = c("model.type", "unconditional", "conditional"),
    method = c("trust", "nlm", "CG", "L-BFGS-B"), fscale, maxiter = 1000,
    nowarn = TRUE, newton = TRUE, optout = FALSE, ...)

Arguments

`x`	an `nind` by `nnode` matrix, the data for an aster model. The rows are independent and identically modeled random vectors. See details below for further requirements. `aster.formula` constructs such an `x` from the response in its formula. Hence data for `aster.formula` must be a vector of length `nind * nnode`.
`root`	an object of the same shape as `x`, the root data. For `aster.default` an `nind` by `nnode` matrix, For `aster.formula` an `nind * nnode` vector.
`pred`	an integer vector of length `nnode` determining the dependence graph of the aster model. `pred[j]` is the index of the predecessor of the node with index `j` unless the predecessor is a root node, in which case `pred[j] == 0`. See details below for further requirements.
`fam`	an integer vector of length `nnode` determining the exponential family structure of the aster model. Each element is an index into the vector of family specifications given by the argument `famlist`.
`modmat`	an `nind` by `nnode` by `ncoef` three-dimensional array, the model matrix. `aster.formula` constructs such a `modmat` from its formula, the data frame `data`, and the variables in the environment of the formula.
`parm`	usually missing. Otherwise a vector of length `ncoef` giving a starting point for the optimization.
`type`	type of model. The value of this argument can be abbreviated.
`famlist`	a list of family specifications (see `families`).
`origin`	Distinguished point in parameter space. May be missing, in which case an unspecified default is provided. See details below for further explanation. This is what `lm`, `glm` and other functions that do regression call “offset” but we don't change our name for reasons of backward compatibility.
`origin.type`	Parameter space in which specified distinguished point is located. If `"conditional"` then argument `"origin"` is a conditional canonical parameter value. If `"unconditional"` then argument `"origin"` is an unconditional canonical parameter value. If `"model.type"` then the type is taken from argument `"type"`. The value of this argument can be abbreviated.
`method`	optimization method. If `"trust"` then the `trust` function is used. If `"nlm"` then the `nlm` function is used. Otherwise the `optim` function is used with the specified `method` supplied to it. The value of this argument can be abbreviated.
`fscale`	an estimate of the size of the log likelihood at the maximum. Defaults to `nind`.
`maxiter`	maximum number of iterations. Defaults to '1000'.
`nowarn`	if `TRUE` (the default), suppress warnings from the optimization routine.
`newton`	if `TRUE` (the default), do one Newton iteration on the result produced by the optimization routine, except when `method == "trust"` when no such Newton iteration is done, regardless of the value of `newton`, because `trust` always terminates with a Newton iteration when it converges.
`optout`	if `TRUE`, save the entire result of the optimization routine (`trust`, `nlm`, or `optim`, as the case may be).
`coef.names`	names of the regression coefficients. If missing, `dimnames(modmat)[[3]]` is used. In `aster.formula` these are produced automatically by the R formula machinery.
`...`	other arguments passed to the optimization method.
`formula`	a symbolic description of the model to be fit. See `lm`, `glm`, and `formula` for discussions of the R formula mini-language.
`varvar`	a variable of the same length as the response in the formula that is a factor whose levels are character strings treated as variable names. The number of variable names is `nnode`. Must be of the form `rep(vars, each = nind)` where `vars` is a vector of variable names. Usually found in the data frame `data` when this is produced by the `reshape` function.
`idvar`	a variable of the same length as the response in the formula that indexes individuals. The number of individuals is `nind`. Must be of the form `rep(inds, times = nnode)` where `inds` is a vector of labels for individuals. Usually found in the data frame `data` when this is produced by the `reshape` function.
`data`	an optional data frame containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `aster` is called. Usually produced by the `reshape` function.

Details

The vector pred must satisfy all(pred < seq(along = pred)), that is, each predecessor must precede in the order given in pred. The vector pred defines a function p.

The joint distribution of the data matrix x is a product of conditionals

\prod_{i \in I} \prod_{j \in J} \Pr \{ X_{i j} | X_{i p(j)} \}

When p(j) = 0, the notation X_{i p(j)} means root[i, j]. Other elements of the matrix root are not used.

The conditional distribution \Pr \{ X_{i j} | X_{i p(j)} \} is the X_{i p(j)}-fold convolution of the j-th family in the vector fam, a one-parameter exponential family (i.e., the sum of X_{i p(j)} i.i.d. terms having this one-parameter exponential family distribution).

For type == "conditional" the canonical parameter vector \theta_{i j} is modeled in GLM fashion as \theta = a + M \beta where M is the model matrix modmat and a is the distinguished point origin. Since the “vector” \theta is actually a matrix, the “matrix” M must correspondingly be a three-dimensional array. So \theta = a + M \beta written out in full is

\theta_{i j} = a_{i j} + \sum_{k \in K} m_{i j k} \beta_k

This specifies the log likelihood.

For type == "unconditional" the canonical parameter vector for an unconditional model is modeled in GLM fashion as \varphi = a + M \beta (where the notation is as above). The unconditional canonical parameters are then specified in terms of the conditional ones by

\varphi_{i j} = \theta_{i j} - \sum_{k \in S(j)} \psi_k(\theta_{i k})

where S(j) denotes the set of successors of j, the k such that p(k) = j, and \psi_k is the cumulant function for the k-th exponential family. This rather crazy looking formulation is an invertible change of parameter and makes \varphi the canonical parameter and x the canonical statistic of a full flat unconditional exponential family. Again, this specifies the log likelihood.

In versions of aster prior to version 0.6 there was no a in the model specification, which is the same as specifying a = 0 in the current specification. If a is in the column space of the model matrix, that is, if there exists an \alpha such that a = M \alpha, then there is no difference in the model specified with a and the one with a = 0. The maximum likelihood regression coefficients \hat{\beta} will be different, but the maximum likelihood estimates of all other parameters (conditional and unconditional, canonical and mean value) will be the same. This is the usual case and explains why “linear” models (with a = 0) as opposed to “affine” models (with general a) are popular. In the unusual case where a is not in the column space of the design matrix, then affine models are a generalization of linear models: the two are not equivalent, their maximum likelihood estimates are not the same in any parameterization.

In order to use the R model formula mini-language we must flatten the dimensionality, making the model matrix modmat two-dimensional (a true matrix). This must be done as if by matrix(modmat, ncol = ncoef), which imposes the requirements on varvar and idvar given in the arguments section: they must look like row(x) and col(x) modulo relabeling. Then x and root become one-dimensional, done as if by as.numeric(x) and as.numeric(root).

The standard way to do this in R is to use the reshape function on a data frame in which the columns of the x matrix are variables in the data frame. reshape automatically puts things in the right order and creates varvar and idvar. This is shown in the examples section below and in the package vignette titled "Aster Package Tutorial".

Value

aster returns an object of class inheriting from "aster". aster.formula, returns an object of class "aster" and subclass "aster.formula".

The function summary (i.e., summary.aster) can be used to obtain or print a summary of the results, the function anova (i.e., anova.aster) to produce an analysis of deviance table, and the function predict (i.e., predict.aster) to produce predicted values and standard errors.

An object of class "aster" is a list containing at least the following components:

`coefficients`	a named vector of coefficients.
`rank`	the numeric rank of the fitted generalized linear model part of the aster model (i.e., the rank of `modmat`).
`deviance`	up to a constant, minus twice the maximized log-likelihood. (Note the minus. This is somewhat counterintuitive, but cannot be changed for reasons of backward compatibility.)
`iter`	the number of iterations used by the optimization method.
`converged`	logical. Was the optimization algorithm judged to have converged?
`code`	integer. The convergence code returned by the optimization method.
`gradient`	The gradient vector of minus the log likelihood at the fitted `coefficients` vector.
`hessian`	The Hessian matrix of minus the log likelihood (i.e., the observed Fisher information) at the fitted `coefficients` vector. This is also the expected Fisher information when `type == "unconditional"`.
`fisher`	Expected Fisher information at the fitted `coefficients` vector.
`optout`	The object returned by the optimization routine (`trust`, `nlm`, or `optim`). Only returned when the argument `optout` is `TRUE`.

Calls to aster.formula return a list also containing:

`call`	the matched call.
`formula`	the formula supplied.
`terms`	the `terms` object used.
`data`	the `data argument`.

NA Values

It was almost always wrong for aster model data to have NA values. Although theoretically possible for the R formula mini-language to do the right thing for an aster model with NA values in the data, usually it does some wrong thing. Thus, since version 0.8-20, this function and the reaster function give errors when used with data having NA values. Users must remove all NA values (or replace them with what they should be, perhaps zero values) “by hand”.

Directions of Recession

Even if the result of this function has component converges equal to TRUE, the result will be nonsense if there are one or more directions of recession. These are not detected by this function, but rather by the summary function applied to the result of this function, for which see summary.aster.

References

Geyer, C. J., Wagenius, S., and Shaw, R. G. (2007) Aster Models for Life History Analysis. Biometrika, 94, 415–426. doi:10.1093/biomet/asm030.

Shaw, R. G., Geyer, C. J., Wagenius, S., Hangelbroek, H. H., and Etterson, J. R. (2008) Unifying Life History Analysis for Inference of Fitness and population growth. American Naturalist, 172, E35–E47. doi:10.1086/588063.

Examples

### see package vignette for explanation ###
library(aster)
data(echinacea)
vars <- c("ld02", "ld03", "ld04", "fl02", "fl03", "fl04",
    "hdct02", "hdct03", "hdct04")
redata <- reshape(echinacea, varying = list(vars), direction = "long",
    timevar = "varb", times = as.factor(vars), v.names = "resp")
redata <- data.frame(redata, root = 1)
pred <- c(0, 1, 2, 1, 2, 3, 4, 5, 6)
fam <- c(1, 1, 1, 1, 1, 1, 3, 3, 3)
hdct <- grepl("hdct", as.character(redata$varb))
redata <- data.frame(redata, hdct = as.integer(hdct))
level <- gsub("[0-9]", "", as.character(redata$varb))
redata <- data.frame(redata, level = as.factor(level))
aout <- aster(resp ~ varb + level : (nsloc + ewloc) + hdct : pop,
    pred, fam, varb, id, root, data = redata)
summary(aout, show.graph = TRUE)

[Package aster version 1.1-3 Index]