R: Predict Method for Aster Model Fits

predict.aster {aster}

R Documentation

Predict Method for Aster Model Fits

Description

Obtains predictions (parameter estimates) and optionally estimates standard errors of those predictions (parameter estimates) from a fitted Aster model object.

Usage

## S3 method for class 'aster'
predict(object, x, root, modmat, amat,
    parm.type = c("mean.value", "canonical"),
    model.type = c("unconditional", "conditional"),
    is.always.parameter = FALSE,
    se.fit = FALSE, info = c("expected", "observed"),
    info.tol = sqrt(.Machine$double.eps), newcoef = NULL,
    gradient = se.fit, ...)

## S3 method for class 'aster.formula'
predict(object, newdata, varvar, idvar, root, amat,
    parm.type = c("mean.value", "canonical"),
    model.type = c("unconditional", "conditional"),
    is.always.parameter = FALSE,
    se.fit = FALSE, info = c("expected", "observed"),
    info.tol = sqrt(.Machine$double.eps), newcoef = NULL,
    gradient = se.fit, ...)

Arguments

`object`	a fitted object of class inheriting from `"aster"` or `"aster.formula"`.
`modmat`	a model matrix to use instead of `object$modmat`. Must have the same structure (three-dimensional array, first index runs over individuals, second over nodes of the graphical model, third over covariates). Must have the same second and third dimensions as `object$modmat`. The second and third components of `dimnames(modmat)` and `dimnames(object$modmat)` must also be the same. May be missing, in which case `object$modmat` is used. `predict.aster.formula` constructs such a `modmat` from `object$formula`, the data frame `newdata`, and variables in the environment of the formula. When `newdata` is missing, `object$modmat` is used.
`x`	response. Ignored and may be missing unless `parm.type = "mean.value"` and `model.type = "conditional"`. Even then may be missing when `modmat` is missing, in which case `object$x` is used. A matrix whose first and second dimensions and the corresponding dimnames agrees with those of `modmat` and `object$modmat`. `predict.aster.formula` constructs such an `x` from the response variable name in `object$formula`, the data frame `newdata`, and the variables in the environment of the formula. When `newdata` is missing, `object$x` is used.
`root`	root data. Ignored and may be missing unless `parm.type == "mean.value"`. Even then may be missing when `modmat` is missing, in which case `object$root` is used. A matrix of the same form as `x`. `predict.aster.formula` looks up the variable supplied as the argument `root` in the data frame `newdata` or in the variables in the environment of the formula and makes it a matrix of the same form as `x`. When `newdata` is missing, `object$root` is used.
`amat`	if `zeta` is the requested prediction (mean value or canonical, unconditional or conditional, depending on `parm.type` and `model.type`), then we predict the linear function `t(amat) %*% zeta`. May be missing, in which case the identity linear function is used. For `predict.aster`, a three-dimensional array with `dim(amat)[1:2] == dim(modmat)[1:2]`. For `predict.aster.formula`, a three-dimensional array of the same dimensions as required for `predict.aster` (even though `modmat` is not provided). First dimension is number of individuals in `newdata`, if provided, otherwise number of individuals in `object$data`. Second dimension is number of variables (`length(object$pred)`).
`parm.type`	the type of parameter to predict. The default is mean value parameters (the opposite of the default for `predict.glm`), the expected value of a linear function of the response under the MLE probability model (also called the MLE of the mean value parameter). The expectation is unconditional or conditional depending on `parm.type`. The alternative `"canonical"` is the value of a linear function of the MLE of canonical parameters under the MLE probability model. The canonical parameter is unconditional or conditional depending on `parm.type`. The value of this argument can be abbreviated.
`model.type`	the type of model in which to predict. The default is `"unconditional"` in which case the parameters (either mean value or canonical, depending on the value of `parm.type`) are those of an unconditional model. The alternative is `"conditional"` in which case the parameters are those of a conditional model. The value of this argument can be abbreviated.
`is.always.parameter`	logical, default `FALSE`. Only affects the result when `parm.type = "mean.value"` and `model.type = "conditional"`. `TRUE` means the conditional mean value parameter is produced. `FALSE` means the conditional mean values themselves are produced (which depend on data so are not parameters). See Conditional Mean Values Section below for further explanation.
`se.fit`	logical switch indicating if standard errors are required.
`info`	the type of Fisher information to use to compute standard errors.
`info.tol`	tolerance for eigenvalues of Fisher information. If `eval` is the vector of eigenvalues of the information matrix, then `eval < cond.tol * max(eval)` are considered zero. Hence the corresponding eigenvectors are directions of constancy or recession of the log likelihood.
`newdata`	optionally, a data frame in which to look for variables with which to predict. If omitted, see `modmat` above. See also details section below.
`varvar`	a variable of length `nrow(newdata)`, typically a variable in `newdata` that is a factor whose levels are character strings treated as variable names. The number of variable names is `nnode`. Must be of the form `rep(vars, each = nind)` where `vars` is a vector of variable names. Not used if `newdata` is missing.
`idvar`	a variable of length `nrow(newdata)`, typically a variable in `newdata` that indexes individuals. The number of individuals is `nind`. Must be of the form `rep(inds, times = nnode)` where `inds` is a vector of labels for individuals. Not used if `newdata` is missing.
`newcoef`	if not `NULL`, a variable of length `object$coefficients` and used in its place when one wants predictions at other than the fitted coefficient values.
`gradient`	if `TRUE` return the gradient (Jacobian of the transformation) matrix. This matrix has number of rows equal to the length of the fitted values and number of columns equal to the number of regression coefficients. It is the derivative matrix (matrix of partial derivatives) of the mapping from regression coefficients to whatever the predicted values are, which depends on what the arguments `newdata`, `amat`, `parm.type`, and `model.type` are.
`...`	further arguments passed to or from other methods.

Details

Note that model.type need have nothing to do with the type of the fitted aster model, which is object$type.

Whether the fitted model is conditional or unconditional, one typically wants unconditional mean value parameters, because conditional mean value parameters for hypothetical individuals depend on the hypothetical data x, which usually makes no scientific sense.

If one asks for conditional mean value parameters, one gets them only if is.always.parameter = TRUE is specified. Otherwise, conditional expectations that are not parameters (because they depend on data) are produced. See Conditional Mean Values Section for more about this.

Similarly, if object$type == "conditional", then the conditional canonical parameters are a linear function of the regression coefficients \theta = M \beta, where M is the model matrix, but one can predict either \theta or the unconditional canonical parameters \varphi, as selected by model.type. Similarly, if object$type == "unconditional", so \varphi = M \beta, one can predict either \theta or \varphi as selected by model.type.

The specification of the prediction model is confusing because there are so many possibilities. First the “usual” case. The fit was done using a formula, found in object$formula. A data frame newdata that has the same variables as object$data, the data frame used in the fit, but may have different rows (representing hypothetical individuals) is supplied. But newdata must specify all nodes of the graphical model for each (hypothetical, new) individual, just like object$data did for real observed individuals. Hence newdata is typically constructed using reshape. See also the details section of aster.

In this “usual” case we need varvar and idvar to tell us what rows of newdata correspond to which individuals and nodes (the same role they played in the original fit by aster). If we are predicting canonical parameters, then we do not need root or x. If we are predicting unconditional mean value parameters, then we also need root but not x. If we are predicting conditional mean value parameters, then we also need both root and x. In the “usual” case, these are found in newdata and not supplied as arguments to predict. Moreover, x is not named "x" but is the response in out$formula.

The next case, predict(object) with no other arguments, is often used with linear models (predict.lm), but we expect will be little used for aster models. As for linear models, this “predicts” the observed data. In this case modmat, x, and root are found in object and nothing is supplied as an argument to predict.aster, except perhaps amat if one wants a function of predictions for the observed data.

The final case, also perhaps little used, is a fail-safe mode for problems in which the R formula language just cannot be bludgeoned into doing what you want. This is the same reason aster.default exists. Then a model matrix can be constructed “by hand”, and the function predict.aster is used instead of predict.aster.formula.

Note that it is possible to use a “constructed by hand” model matrix even if object was produced by aster.formula. Simply explicitly call predict.aster rather than predict to override the R method dispatch (which would call predict.aster.formula in this case).

Value

If se.fit = FALSE and gradient = FALSE, a vector of predictions. If se.fit = TRUE, a list with components

`fit`	Predictions
`se.fit`	Estimated standard errors
`gradient`	The gradient of the transformation from regression coefficients to predictions

If gradient = TRUE, a list with components

`fit`	Predictions
`gradient`	The gradient of the transformation from regression coefficients to predictions

Conditional Mean Values

Both the original aster paper (Geyer, et al., 2007) and this package are weird about conditional mean values. Equation (10) of that paper defines (using different notation from what we use here)

\xi_j = E(x_j | x_{p(j)})

where x_j are components of the response vector and p(j) denotes denotes the predecessor of node j. That paper explicitly says that this is is not a parameter because it depends on the data. In fact

E(x_j | x_{p(j)}) = x_{p(j)} E(x_j | x_{p(j)} = 1)

(this is equation (3) of that paper in different notation). Thus it is weird to use a Greek letter to denote this.

There should be a conditional mean value parameter, and Geyer (2010, equation (11b)) defines it as

\xi_j = E(y_j | y_{p(j)} = 1)

(This equation only makes sense when the conditioning event x_{p(j)} = 1 is possible, which it is not for k-truncated arrows for k > 0. Then a more complicated definition must be used. By definition x_j is the sum of x_{p(j)} independent and identically distributed random variables, and \xi_j is always the mean of one of those random variables.) This gives us the important relationship between conditional and unconditional mean value parameters

\mu_j = \xi_j \mu_{p(j)}

which holds for all successor nodes j. All later writings of this author use this definition of \xi as does the R package aster2 (Geyer, 2017). This is one of six important parameterizations of an unconditional aster model (Geyer, 2010, Sections 2.7 and 2.8). The R package aster2 uses all of them.

This function (as of version 1.0 of this package) has an argument is.always.parameter to switch between these two definitions in case parm.type = "mean.value" and model.type = "conditional" are specified. Then is.always.parameter = TRUE specifies that the latter definition of \xi is produced (which is a parameter, with all other options for parm.type and model.type). The option is.always.parameter = FALSE specifies that the former definition of \xi is produced (which is a conditional expectation but not a parameter) and is what this function produced in versions of this package before 1.0.

References

Geyer, C. J. (2010) A Philosophical Look at Aster Models. Technical Report No. 676. School of Statistics, University of Minnesota. http://purl.umn.edu/57163.

Geyer, C.~J. (2017). R package aster2 (Aster Models), version 0.3. https://cran.r-project.org/package=aster2.

Geyer, C. J., Wagenius, S., and Shaw, R. G. (2007) Aster models for life history analysis. Biometrika, 94, 415–426. doi:10.1093/biomet/asm030.

Examples

### see package vignette for explanation ###
library(aster)
data(echinacea)
vars <- c("ld02", "ld03", "ld04", "fl02", "fl03", "fl04",
    "hdct02", "hdct03", "hdct04")
redata <- reshape(echinacea, varying = list(vars), direction = "long",
    timevar = "varb", times = as.factor(vars), v.names = "resp")
redata <- data.frame(redata, root = 1)
pred <- c(0, 1, 2, 1, 2, 3, 4, 5, 6)
fam <- c(1, 1, 1, 1, 1, 1, 3, 3, 3)
hdct <- grepl("hdct", as.character(redata$varb))
redata <- data.frame(redata, hdct = as.integer(hdct))
level <- gsub("[0-9]", "", as.character(redata$varb))
redata <- data.frame(redata, level = as.factor(level))
aout <- aster(resp ~ varb + level : (nsloc + ewloc) + hdct : pop,
    pred, fam, varb, id, root, data = redata)
newdata <- data.frame(pop = levels(echinacea$pop))
for (v in vars)
    newdata[[v]] <- 1
newdata$root <- 1
newdata$ewloc <- 0
newdata$nsloc <- 0
renewdata <- reshape(newdata, varying = list(vars),
     direction = "long", timevar = "varb", times = as.factor(vars),
     v.names = "resp")
hdct <- grepl("hdct", as.character(renewdata$varb))
renewdata <- data.frame(renewdata, hdct = as.integer(hdct))
level <- gsub("[0-9]", "", as.character(renewdata$varb))
renewdata <- data.frame(renewdata, level = as.factor(level))
nind <- nrow(newdata)
nnode <- length(vars)
amat <- array(0, c(nind, nnode, nind))
for (i in 1:nind)
    amat[i , grep("hdct", vars), i] <- 1
foo <- predict(aout, varvar = varb, idvar = id, root = root,
    newdata = renewdata, se.fit = TRUE, amat = amat)
bar <- cbind(foo$fit, foo$se.fit)
dimnames(bar) <- list(as.character(newdata$pop), c("Estimate", "Std. Error"))
print(bar)

[Package aster version 1.1-3 Index]