predict.emfrail {frailtyEM} | R Documentation |
Predicted hazard and survival curves from an emfrail
object
Description
Predicted hazard and survival curves from an emfrail
object
Usage
## S3 method for class 'emfrail'
predict(object, newdata = NULL, lp = NULL,
strata = NULL, quantity = c("cumhaz", "survival"),
type = c("conditional", "marginal"), conf_int = NULL,
individual = FALSE, conf_level = 0.95, ...)
Arguments
object |
An |
newdata |
A data frame with the same variable names as those that appear in the |
lp |
A vector of linear predictor values at which to calculate the curves. Default is 0 (baseline). |
strata |
The name of the strata (if applicable) for which the prediction should be made. |
quantity |
Can be |
type |
Can be |
conf_int |
Can be |
individual |
Logical. Are the observations in |
conf_level |
The width of the confidence intervals. By default, 95% confidence intervals are calculated. |
... |
Ignored |
Details
The function calculates predicted cumulative hazard and survival curves for given covariate
or linear predictor values; for the first, newdata
must be specified and for the latter
lp
must be specified. Each row of newdata
or element of lp
is consiered to be
a different subject, and the desired predictions are produced for each of them separately.
In newdata
two columns may be specified with the names tstart
and tstop
.
In this case, each subject is assumed to be at risk only during the times specified by these two values.
If the two are not specified, the predicted curves are produced for a subject that is at risk for the
whole follow-up time.
A slightly different behaviour is observed if individual == TRUE
. In this case, all the rows of
newdata
are assumed to come from the same individual, and tstart
and tstop
must
be specified, and must not overlap. This may be used for describing subjects that
are not at risk during certain periods or subjects with time-dependent covariate values.
The two "quantities" that can be returned are
named cumhaz
and survival
. If we denote each quantity with q
, then the columns with the marginal estimates
are named q_m
. The confidence intervals contain the name of the quantity (conditional or marginal) followed by _l
or _r
for
the lower and upper bound. The bounds calculated with the adjusted standard errors have the name of the regular bounds followed by
_a
. For example, the adjusted lower bound for the marginal survival is in the column named survival_m_l_a
.
The emfrail
only gives the Breslow estimates of the baseline hazard \lambda_0(t)
at the
event time points, conditional on the frailty. Let \lambda(t)
be the baseline hazard for a linear predictor of interest.
The estimated conditional cumulative hazard is then
\Lambda(t) = \sum_{s= 0}^t \lambda(s)
. The variance of \Lambda(t)
can be calculated from the (maybe adjusted)
variance-covariance matrix.
The conditional survival is obtained by the usual expression S(t) = \exp(-\Lambda(t))
. The marginal survival
is given by
\bar S(t) = E \left[\exp(-\Lambda(t)) \right] = \mathcal{L}(\Lambda(t)),
i.e. the Laplace transform of the frailty distribution calculated in \Lambda(t)
.
The marginal hazard is obtained as
\bar \Lambda(t) = - \log \bar S(t).
The only standard errors that are available from emfrail
are those for \lambda_0(t)
. From this,
standard errors of \log \Lambda(t)
may be calculated. On this scale, the symmetric confidence intervals are built, and then
moved to the desired scale.
Value
The return value is a single data frame (if lp
has length 1,
newdata
has 1 row or individual == TRUE
) or a list of data frames corresponding to each value of
lp
or each row of newdata
otherwise.
The names of the columns in the returned data frames are as follows: time
represents the unique event time points
from the data set, lp
is the value of the linear predictor (as specified in the input or as calculated from the lines of newdata
).
By default, for each lp
a data frame will contain the following columns: cumhaz
, survival
,
cumhaz_m
, survival_m
for the cumulative hazard and survival, conditional and marginal, with corresponding confidence
bands. The naming of the columns is explained more in the Details section.
Note
The linear predictor is taken as fixed, so the variability in the estimation of the regression coefficient is not taken into account.
Does not support left truncation (at the moment). That is because, if individual == TRUE
and tstart
and tstop
are
specified, for the marginal estimates the distribution of the frailty is used to calculate the integral, and not
the distribution of the frailty given the truncation.
For performance reasons, consider running with conf_int = NULL
; the reason is that the deltamethod
function that is used
to calculate the confidence intervals easily becomes slow when there is a large number of time points
for the cumulative hazard.
See Also
plot.emfrail
, autoplot.emfrail
Examples
kidney$sex <- ifelse(kidney$sex == 1, "male", "female")
m1 <- emfrail(formula = Surv(time, status) ~ sex + age + cluster(id),
data = kidney)
# get all the possible prediction for the value 0 of the linear predictor
predict(m1, lp = 0)
# get the cumulative hazards for two different values of the linear predictor
predict(m1, lp = c(0, 1), quantity = "cumhaz", conf_int = NULL)
# get the cumulative hazards for a female and for a male, both aged 30
newdata1 <- data.frame(sex = c("female", "male"),
age = c(30, 30))
predict(m1, newdata = newdata1, quantity = "cumhaz", conf_int = NULL)
# get the cumulative hazards for an individual that changes
# sex from female to male at time 40.
newdata2 <- data.frame(sex = c("female", "male"),
age = c(30, 30),
tstart = c(0, 40),
tstop = c(40, Inf))
predict(m1, newdata = newdata2,
individual = TRUE,
quantity = "cumhaz", conf_int = NULL)