predict.aster {aster} | R Documentation |
Predict Method for Aster Model Fits
Description
Obtains predictions (parameter estimates) and optionally estimates standard errors of those predictions (parameter estimates) from a fitted Aster model object.
Usage
## S3 method for class 'aster'
predict(object, x, root, modmat, amat,
parm.type = c("mean.value", "canonical"),
model.type = c("unconditional", "conditional"),
is.always.parameter = FALSE,
se.fit = FALSE, info = c("expected", "observed"),
info.tol = sqrt(.Machine$double.eps), newcoef = NULL,
gradient = se.fit, ...)
## S3 method for class 'aster.formula'
predict(object, newdata, varvar, idvar, root, amat,
parm.type = c("mean.value", "canonical"),
model.type = c("unconditional", "conditional"),
is.always.parameter = FALSE,
se.fit = FALSE, info = c("expected", "observed"),
info.tol = sqrt(.Machine$double.eps), newcoef = NULL,
gradient = se.fit, ...)
Arguments
object |
a fitted object of class inheriting from |
modmat |
a model matrix to use instead of May be missing, in which case
|
x |
response. Ignored and may be missing unless
|
root |
root data. Ignored and may be missing unless
|
amat |
if For For |
parm.type |
the type of parameter to predict. The default is
mean value parameters (the opposite of the default
for The alternative The value of this argument can be abbreviated. |
model.type |
the type of model in which to predict. The default is
The value of this argument can be abbreviated. |
is.always.parameter |
logical, default |
se.fit |
logical switch indicating if standard errors are required. |
info |
the type of Fisher information to use to compute standard errors. |
info.tol |
tolerance for eigenvalues of Fisher information.
If |
newdata |
optionally, a data frame in which to look for variables with
which to predict. If omitted, see |
varvar |
a variable of length |
idvar |
a variable of length |
newcoef |
if not |
gradient |
if |
... |
further arguments passed to or from other methods. |
Details
Note that model.type
need have nothing to do with the type
of the fitted aster model, which is object$type
.
Whether the
fitted model is conditional or unconditional, one typically wants
unconditional mean value parameters, because conditional mean
value parameters for hypothetical individuals depend on the hypothetical
data x
, which usually makes no scientific sense.
If one asks for conditional mean value parameters, one gets
them only if is.always.parameter = TRUE
is specified.
Otherwise, conditional expectations that are not parameters (because
they depend on data) are produced.
See Conditional Mean Values Section for more about this.
Similarly, if object$type == "conditional"
, then the conditional
canonical parameters are a linear function of the regression coefficients
\theta = M \beta
, where M
is the model matrix,
but one can predict either \theta
or the unconditional
canonical parameters \varphi
,
as selected by model.type
.
Similarly, if object$type == "unconditional"
,
so \varphi = M \beta
, one can predict either
\theta
or \varphi
as selected by model.type
.
The specification of the prediction model is confusing because there
are so many possibilities. First the “usual” case.
The fit was done using a formula, found in object$formula
.
A data frame newdata
that has the same variables as object$data
,
the data frame used in the fit, but may have different rows (representing
hypothetical individuals) is supplied.
But newdata
must specify all nodes
of the graphical model for each (hypothetical, new) individual,
just like object$data
did for real observed individuals.
Hence newdata
is typically constructed using reshape
.
See also the details section of aster
.
In this “usual” case we need varvar
and idvar
to
tell us what rows of newdata
correspond to which individuals and
nodes (the same role they played in the original fit by aster
).
If we are predicting canonical parameters, then we do not need root
or
x
.
If we are predicting unconditional mean value parameters, then
we also need root
but not x
.
If we are predicting conditional mean value parameters, then
we also need both root
and x
.
In the “usual” case, these are found in newdata
and
not supplied as arguments to predict
. Moreover, x
is not named "x"
but is the response in out$formula
.
The next case, predict(object)
with no other arguments,
is often used with linear models (predict.lm
),
but we expect will be little used for aster models. As for linear
models, this “predicts” the observed data. In this case
modmat
, x
, and root
are found in object
and nothing is supplied as an argument to predict.aster
, except
perhaps amat
if one wants a function of predictions for the observed
data.
The final case, also perhaps little used, is a fail-safe mode for problems
in which the R formula language just cannot be bludgeoned into doing what
you want. This is the same reason aster.default
exists.
Then a model matrix can be constructed “by hand”, and the function
predict.aster
is used instead of predict.aster.formula
.
Note that it is possible to use a “constructed by hand”
model matrix even if object
was produced by aster.formula
.
Simply explicitly call predict.aster
rather than predict
to override the R method dispatch (which would
call predict.aster.formula
in this case).
Value
If se.fit = FALSE
and gradient = FALSE
, a vector of predictions.
If se.fit = TRUE
, a list with components
fit |
Predictions |
se.fit |
Estimated standard errors |
gradient |
The gradient of the transformation from regression coefficients to predictions |
If gradient = TRUE
, a list with components
fit |
Predictions |
gradient |
The gradient of the transformation from regression coefficients to predictions |
Conditional Mean Values
Both the original aster paper (Geyer, et al., 2007) and this package are weird about conditional mean values. Equation (10) of that paper defines (using different notation from what we use here)
\xi_j = E(x_j | x_{p(j)})
where x_j
are components of the response vector and p(j)
denotes denotes the predecessor of node j
. That paper explicitly says
that this is is not a parameter because it depends on the data. In fact
E(x_j | x_{p(j)}) = x_{p(j)} E(x_j | x_{p(j)} = 1)
(this is equation (3) of that paper in different notation). Thus it is weird to use a Greek letter to denote this.
There should be a conditional mean value parameter, and Geyer (2010, equation (11b)) defines it as
\xi_j = E(y_j | y_{p(j)} = 1)
(This equation only makes sense when the conditioning event
x_{p(j)} = 1
is possible, which it is not for
k
-truncated arrows for k > 0
. Then a more complicated definition
must be used. By definition x_j
is the sum
of x_{p(j)}
independent and identically distributed random
variables, and \xi_j
is always the mean of one of those
random variables.)
This gives us the important relationship between conditional and unconditional
mean value parameters
\mu_j = \xi_j \mu_{p(j)}
which holds for all successor nodes j
.
All later writings of this author use this definition of \xi
as does the R package aster2
(Geyer, 2017).
This is one of six important parameterizations of an unconditional aster model
(Geyer, 2010, Sections 2.7 and 2.8). The R package aster2
uses all
of them.
This function (as of version 1.0 of this package) has an argument
is.always.parameter
to switch between these two definitions
in case parm.type = "mean.value"
and model.type = "conditional"
are specified. Then is.always.parameter = TRUE
specifies that the latter
definition of \xi
is produced (which is a parameter, with all other
options for parm.type
and model.type
). The option
is.always.parameter = FALSE
specifies that the former
definition of \xi
is produced (which is a conditional expectation
but not a parameter) and is what this function produced in versions of this
package before 1.0.
References
Geyer, C. J. (2010) A Philosophical Look at Aster Models. Technical Report No. 676. School of Statistics, University of Minnesota. http://purl.umn.edu/57163.
Geyer, C.~J. (2017).
R package aster2
(Aster Models), version 0.3.
https://cran.r-project.org/package=aster2.
Geyer, C. J., Wagenius, S., and Shaw, R. G. (2007) Aster models for life history analysis. Biometrika, 94, 415–426. doi:10.1093/biomet/asm030.
Examples
### see package vignette for explanation ###
library(aster)
data(echinacea)
vars <- c("ld02", "ld03", "ld04", "fl02", "fl03", "fl04",
"hdct02", "hdct03", "hdct04")
redata <- reshape(echinacea, varying = list(vars), direction = "long",
timevar = "varb", times = as.factor(vars), v.names = "resp")
redata <- data.frame(redata, root = 1)
pred <- c(0, 1, 2, 1, 2, 3, 4, 5, 6)
fam <- c(1, 1, 1, 1, 1, 1, 3, 3, 3)
hdct <- grepl("hdct", as.character(redata$varb))
redata <- data.frame(redata, hdct = as.integer(hdct))
level <- gsub("[0-9]", "", as.character(redata$varb))
redata <- data.frame(redata, level = as.factor(level))
aout <- aster(resp ~ varb + level : (nsloc + ewloc) + hdct : pop,
pred, fam, varb, id, root, data = redata)
newdata <- data.frame(pop = levels(echinacea$pop))
for (v in vars)
newdata[[v]] <- 1
newdata$root <- 1
newdata$ewloc <- 0
newdata$nsloc <- 0
renewdata <- reshape(newdata, varying = list(vars),
direction = "long", timevar = "varb", times = as.factor(vars),
v.names = "resp")
hdct <- grepl("hdct", as.character(renewdata$varb))
renewdata <- data.frame(renewdata, hdct = as.integer(hdct))
level <- gsub("[0-9]", "", as.character(renewdata$varb))
renewdata <- data.frame(renewdata, level = as.factor(level))
nind <- nrow(newdata)
nnode <- length(vars)
amat <- array(0, c(nind, nnode, nind))
for (i in 1:nind)
amat[i , grep("hdct", vars), i] <- 1
foo <- predict(aout, varvar = varb, idvar = id, root = root,
newdata = renewdata, se.fit = TRUE, amat = amat)
bar <- cbind(foo$fit, foo$se.fit)
dimnames(bar) <- list(as.character(newdata$pop), c("Estimate", "Std. Error"))
print(bar)