emfrail {frailtyEM} | R Documentation |
Fitting semi-parametric shared frailty models with the EM algorithm
Description
Fitting semi-parametric shared frailty models with the EM algorithm
Usage
emfrail(formula, data, distribution = emfrail_dist(),
control = emfrail_control(), model = FALSE, model.matrix = FALSE,
...)
Arguments
formula |
A formula that contains on the left hand side an object of the type |
data |
A |
distribution |
An object as created by |
control |
An object as created by |
model |
Logical. Should the model frame be returned? |
model.matrix |
Logical. Should the model matrix be returned? |
... |
Other arguments, currently used to warn about deprecated argument names |
Details
The emfrail
function fits shared frailty models for processes which have intensity
\lambda(t) = z \lambda_0(t) \exp(\beta' \mathbf{x})
with a non-parametric (Breslow) baseline intensity \lambda_0(t)
. The outcome
(left hand side of the formula
) must be a Surv
object.
If the object is Surv(tstop, status)
then the usual failure time data is represented.
Gap-times between recurrent events are represented in the same way.
If the left hand side of the formula is created as Surv(tstart, tstop, status)
, this may represent a number of things:
(a) recurrent events episodes in calendar time where a recurrent event episode starts at tstart
and ends at tstop
(b) failure time data with time-dependent covariates where tstop
is the time of a change in covariates or censoring
(status = 0
) or an event time (status = 1
) or (c) clustered failure time with left truncation, where
tstart
is the individual's left truncation time. Unlike regular Cox models, a major distinction is that in case (c) the
distribution of the frailty must be considered conditional on survival up to the left truncation time.
The +cluster()
statement specified the column that determines the grouping (the observations that share the same frailty).
The +strata()
statement specifies a column that determines different strata, for which different baseline hazards are calculated.
The +terminal
specifies a column that contains an indicator for dependent censoring, and then performs a score test
The distribution
argument must be generated by a call to emfrail_dist
. This determines the
frailty distribution, which may be one of gamma, positive stable or PVF (power-variance-function), and the starting
value for the maximum likelihood estimation. The PVF family
also includes a tuning parameter that differentiates between inverse Gaussian and compound Poisson distributions.
Note that, with univariate data (at most one event per individual, no clusters), only distributions with finite expectation
are identifiable. This means that the positive stable distribution should have a maximum likelihood on the edge of the parameter
space (theta = +\inf
, corresponding to a Cox model for independent observations).
The control
argument must be generated by a call to emfrail_control
. Several parameters
may be adjusted that control the precision of the convergenge criteria or supress the calculation of different
quantities.
Value
An object of class emfrail
that contains the following fields:
coefficients |
A named vector of the estimated regression coefficients |
hazard |
The breslow estimate of the baseline hazard at each event time point, in chronological order |
var |
The variance-covariance matrix corresponding to the coefficients and hazard, assuming |
var_adj |
The variance-covariance matrx corresponding to the coefficients and hazard, adjusted for the estimation of theta |
logtheta |
The logarithm of the point estimate of |
var_logtheta |
The variance of the estimated logarithm of |
ci_logtheta |
The likelihood-based 95% confidence interval for the logarithm of |
frail |
The posterior (empirical Bayes) estimates of the frailty for each cluster |
residuals |
A list with two elements, cluster which is a vector that the sum of the cumulative hazards from each cluster for a frailty value of 1, and individual, which is a vector that contains the cumulative hazard corresponding to each row of the data, multiplied by the corresponding frailty estimate |
tev |
The time points of the events in the data set, this is the same length as hazard |
nevents_id |
The number of events for each cluster |
loglik |
A vector of length two with the log-likelihood of the starting Cox model and the maximized log-likelihood |
ca_test |
The results of the Commenges-Andersen test for heterogeneity |
cens_test |
The results of the test for dependence between a recurrent event and a terminal event,
if the |
zph |
The result of |
formula , distribution , control |
The original arguments |
nobs , fitted |
Number of observations and fitted values (i.e. |
mf |
The |
mm |
The |
Note
Several options in the control
arguemnt shorten the running time for emfrail
significantly.
These are disabling the adjustemnt of the standard errors (se_adj = FALSE
), disabling the likelihood-based confidence intervals (lik_ci = FALSE
) or
disabling the score test for heterogeneity (ca_test = FALSE
).
The algorithm is detailed in the package vignette. For the gamma frailty,
the results should be identical with those from coxph
with ties = "breslow"
.
Author(s)
Theodor Balan hello@tbalan.com
References
Balan TA, Putter H (2019) "frailtyEM: An R Package for Estimating Semiparametric Shared Frailty Models", Journal of Statistical Software 90(7) 1-29. doi:10.18637/jss.v090.i07
See Also
plot.emfrail
and autoplot.emfrail
for plot functions directly available, emfrail_pll
for calculating \widehat{L}(\theta)
at specific values of \theta
,
summary.emfrail
for transforming the emfrail
object into a more human-readable format and for
visualizing the frailty (empirical Bayes) estimates,
predict.emfrail
for calculating and visalizing conditional and marginal survival and cumulative
hazard curves. residuals.emfrail
for extracting martingale residuals and logLik.emfrail
for extracting
the log-likelihood of the fitted model.
Examples
m_gamma <- emfrail(formula = Surv(time, status) ~ rx + sex + cluster(litter),
data = rats)
# Inverse Gaussian distribution
m_ig <- emfrail(formula = Surv(time, status) ~ rx + sex + cluster(litter),
data = rats,
distribution = emfrail_dist(dist = "pvf"))
# for the PVF distribution with m = 0.75
m_pvf <- emfrail(formula = Surv(time, status) ~ rx + sex + cluster(litter),
data = rats,
distribution = emfrail_dist(dist = "pvf", pvfm = 0.75))
# for the positive stable distribution
m_ps <- emfrail(formula = Surv(time, status) ~ rx + sex + cluster(litter),
data = rats,
distribution = emfrail_dist(dist = "stable"))
## Not run:
# Compare marginal log-likelihoods
models <- list(m_gamma, m_ig, m_pvf, m_ps)
models
logliks <- lapply(models, logLik)
names(logliks) <- lapply(models,
function(x) with(x$distribution,
ifelse(dist == "pvf",
paste(dist, "/", pvfm),
dist))
)
logliks
## End(Not run)
# Stratified analysis
## Not run:
m_strat <- emfrail(formula = Surv(time, status) ~ rx + strata(sex) + cluster(litter),
data = rats)
## End(Not run)
# Test for conditional proportional hazards (log-frailty as offset)
## Not run:
m_gamma <- emfrail(formula = Surv(time, status) ~ rx + sex + cluster(litter),
data = rats, control = emfrail_control(zph = TRUE))
par(mfrow = c(1,2))
plot(m_gamma$zph)
## End(Not run)
# Draw the profile log-likelihood
## Not run:
fr_var <- seq(from = 0.01, to = 1.4, length.out = 20)
# For gamma the variance is 1/theta (see parametrizations)
pll_gamma <- emfrail_pll(formula = Surv(time, status) ~ rx + sex + cluster(litter),
data = rats,
values = 1/fr_var )
plot(fr_var, pll_gamma,
type = "l",
xlab = "Frailty variance",
ylab = "Profile log-likelihood")
# Recurrent events
mod_rec <- emfrail(Surv(start, stop, status) ~ treatment + cluster(id), bladder1)
# The warnings appear from the Surv object, they also appear in coxph.
plot(mod_rec, type = "hist")
## End(Not run)
# Left truncation
## Not run:
# We simulate some data with truncation times
set.seed(2018)
nclus <- 300
nind <- 5
x <- sample(c(0,1), nind * nclus, TRUE)
u <- rep(rgamma(nclus,1,1), each = 3)
stime <- rexp(nind * nclus, rate = u * exp(0.5 * x))
status <- ifelse(stime > 5, 0, 1)
stime[status == 0] <- 5
# truncate uniform between 0 and 2
ltime <- runif(nind * nclus, min = 0, max = 2)
d <- data.frame(id = rep(1:nclus, each = nind),
x = x,
stime = stime,
u = u,
ltime = ltime,
status = status)
d_left <- d[d$stime > d$ltime,]
mod <- emfrail(Surv(stime, status)~ x + cluster(id), d)
# This model ignores the left truncation, 0.378 frailty variance:
mod_1 <- emfrail(Surv(stime, status)~ x + cluster(id), d_left)
# This model takes left truncation into account,
# but it considers the distribution of the frailty unconditional on the truncation
mod_2 <- emfrail(Surv(ltime, stime, status)~ x + cluster(id), d_left)
# This is identical with:
mod_cox <- coxph(Surv(ltime, stime, status)~ x + frailty(id), data = d_left)
# The correct thing is to consider the distribution of the frailty given the truncation
mod_3 <- emfrail(Surv(ltime, stime, status)~ x + cluster(id), d_left,
distribution = emfrail_dist(left_truncation = TRUE))
summary(mod_1)
summary(mod_2)
summary(mod_3)
## End(Not run)