gsm {rstpm2} | R Documentation |
Parametric and penalised generalised survival models
Description
This implements the generalised survival model g(S(t|x)) = eta, where g is a link function, S is survival, t is time, x are covariates and eta is a linear predictor. The linear predictor can include either parametric or penalised smoothers for the time effects, for time:covariate interactions and for covariate effects. The main model assumption is that the time effects in the linear predictor are smooth. This extends the class of flexible parametric survival models developed by Royston and colleagues. The model has been extended to include relative survival (excess hazards), Gamma frailties and normal random effects.
Usage
gsm(formula, data, smooth.formula = NULL, smooth.args = NULL,
df = 3, cure = FALSE,
tvc = NULL, tvc.formula = NULL,
control = list(), init = NULL,
weights = NULL, robust = FALSE, baseoff = FALSE,
timeVar = "", time0Var = "", use.gr = NULL,
optimiser=NULL, log.time.transform=TRUE,
reltol=NULL, trace = NULL,
link.type=c("PH","PO","probit","AH","AO"), theta.AO=0,
contrasts = NULL, subset = NULL,
robust_initial=NULL,
coxph.strata = NULL, coxph.formula = NULL,
logH.formula = NULL, logH.args = NULL,
bhazard = NULL, bhazinit=NULL, copula=FALSE,
frailty = !is.null(cluster) & !robust & !copula,
cluster = NULL, logtheta=NULL,
nodes=NULL, RandDist=c("Gamma","LogN"), recurrent = FALSE,
adaptive = NULL, maxkappa = NULL,
sp=NULL, criterion=NULL, penalty=NULL,
smoother.parameters=NULL, Z=~1, outer_optim=NULL,
alpha=1, sp.init=1,
penalised=FALSE,
...)
stpm2(formula, data, weights=NULL, subset=NULL, coxph.strata=NULL, ...)
pstpm2(formula, data, weights=NULL, subset=NULL, coxph.strata=NULL, ...)
Arguments
formula |
a formula object, with the response on the left of a |
data |
a data.frame in which to interpret the variables named in
the |
smooth.formula |
either a parametric formula or a penalised |
df |
an integer that describes the degrees of freedom for the |
smooth.args |
a list describing the arguments for the |
tvc |
a list with the names of the time-varying coefficients. For a parametric
model, this uses natural splines (e.g. |
tvc.formula |
separate formula for the time-varying effects. This is combined with
|
baseoff |
Boolean used to determine whether fully define the model
using |
logH.args |
as per |
logH.formula |
as per |
cure |
logical for whether to estimate a cure model (parametric model only). |
control |
list of arguments passed to |
init |
|
coxph.strata |
variable in the |
weights |
an optional vector of 'prior weights' to be used in the
fitting process. Should be |
robust |
Boolean used to determine whether to use a robust variance estimator. |
bhazard |
variable for the baseline hazard for relative survival |
bhazinit |
scalar used to adjust the background cumulative hazards for calculating
initial values. Default=0.1. Deprecated argument: use of the
|
copula |
logical to indicate whether to use a copula model (experimental) |
timeVar |
variable defining the time variable. By default, this is determined from the survival object, however this may be ambiguous if two variables define the time |
sp |
fix the value of the smoothing parameters. |
use.gr |
in R, a Boolean to determine whether to use the gradient
in the optimisation. Default=TRUE, Deprecated argument: use of the
|
criterion |
in Rcpp, determine whether to use "GCV" or "BIC" for for the smoothing parameter selection. |
penalty |
use either the "logH" penalty, which is the default
penalty from mgcv, or the "h" hazard penalty. Default="logH". Deprecated argument: use of the
|
smoother.parameters |
for the hazard penalty, a list with components which are lists with components var, transform and inverse. |
alpha |
an ad hoc tuning parameter for the smoothing parameter. |
sp.init |
initial values for the smoothing parameters. |
trace |
integer for trace reporting; 0 represents no additional
reporting. Default=0. Deprecated argument: use of the
|
contrasts |
an optional list. See the |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
coxph.formula |
additional formula used to improve the fitting of initial values [optional and rarely used]. |
time0Var |
string variable to determine the entry variable; useful for when more than one data variable is used in the entry time. |
link.type |
type of link function. For "PH" (generalised proportional hazards), g(S)=log(-log(S)); for "PO" (generalised proportional odds), g(S)=-logit(S); for "probit" (generalised probit), g(S)=-probit(S); for "AH" (generalised additive hazards), g(S)=-log(S); for "AO" (generalised Aranda-Ordaz), g(S)=log((S^(-theta.AO)-1)/theta.AO). |
theta.AO |
theta parameter for the Aranda-Ordaz link type. |
optimiser |
select which optimiser is used. Default="BFGS". Deprecated argument: use of the
|
log.time.transform |
should a log-transformation be used for calculating the derivative of the design matrix with respect to time? (default=TRUE) |
recurrent |
logical for whether clustered, left truncated data are recurrent or for first event (where the latter requires an adjustment for the frailties or random effects) |
frailty |
logical for whether to fit a shared frailty model |
cluster |
variable that determines the cluster for the frailty. This can be a vector, a string for the column, or a name. This can also be specified using a special. |
logtheta |
initial value for log-theta used in the gamma shared frailty
model (defaults to value from a |
nodes |
number of integration points for Gaussian
quadrature. Default=9. Deprecated argument: use of the
|
RandDist |
type of distribution for the random effect or frailty |
adaptive |
logical for whether to use adaptive or non-adaptive
quadrature, Default=TRUE. Deprecated argument: use of the
|
maxkappa |
double float value for the maximum value of the weight
used in the constraint. Default=1000. Deprecated argument: use of the
|
Z |
formula for the design matrix for the random effects |
reltol |
list with components for search and final relative
tolerances. Default=list(search=1e-10, final=1e-10, outer=1e-5). Deprecated argument: use of the
|
outer_optim |
Integer to indicate the algorithm for outer optimisation. If outer_optim=1 (default), then use Neldear-Mead, otherwise use Nlm. |
robust_initial |
logical for whether to use Nelder-Mead
to find initial values (max 50 iterations). This is useful for
ill-posed initial values. Default= FALSE. Deprecated argument: use of the
|
penalised |
logical to show whether to use penalised models with
|
... |
additional arguments to be passed to the |
Details
The implementation extends the mle2
object from the
bbmle
package.
The default smoothers for time on the linear predictor scale are
nsxs(log(time),df=3)
for the parametric model and
s(log(time))
for the penalised model.
A frequently asked question is: why does rstpm2 give different spline
estimates to flexsurv and Stata's stpm2? The short answer is that
rstpm2 uses a different natural spline basis compared with flexsurv
and Stata's stpm2 and slightly different
knot placement than Stata's stpm2. If the knot
placement is the same, then the predictions and other coefficients are
expected to be very similar. As a longer answer, the default smoother
in rstpm2 is to use an extension of the splines::ns
function
(rstpm2::nsx
), which uses a QR projection of B-splines for
natural splines. In contrast, flexsurv and Stata's stpm2 use truncated
power splines for the natural spline basis (also termed 'restricted
cubic splines'). The B-splines are known to
have good numerical properties, while Stata's stpm2 implementation
defaults to using matrix orthogonalisation to account for any
numerical instability in the truncated power basis. Furthermore,
rstpm2 allows for any smooth parametric function to be used as a
smoother in stpm2
/gsm
, which is an extension over
flexsurv and Stata's stpm2. Finally, it may be difficult to get rstpm2 and
Stata's stpm2 to return the same estimates: although
nsx
includes an argument stata.stpm2.compatible = FALSE
(change to TRUE
for
compatibility), the design matrix for rstpm2 is based on individuals
with events, while Stata's stpm2 determines the spline knots from the
individuals with events and the design matrix is otherwise based on all individuals.
Value
Either a stpm2-class
or pstpm2-class
object.
Author(s)
Mark Clements, Xing-Rong Liu, Benjamin Christoffersen.
Examples
## Not run:
data(brcancer)
summary(fit <- stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,df=3))
## some predictions
head(predict(fit,se.fit=TRUE,type="surv"))
head(predict(fit,se.fit=TRUE,type="hazard"))
## some plots
plot(fit,newdata=data.frame(hormon=0),type="hazard")
plot(fit,newdata=data.frame(hormon=0),type="surv")
## time-varying coefficient
summary(fit.tvc <- stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,df=3,
tvc=list(hormon=3)))
anova(fit,fit.tvc) # compare with and without tvc
## some more plots
plot(fit.tvc,newdata=data.frame(hormon=0),type="hr",var="hormon", ylim=c(0,2))
lines(fit.tvc,newdata=data.frame(hormon=1),type="hr",var="hormon",
col=2)
plot(fit.tvc,newdata=data.frame(hormon=0),type="sdiff",var="hormon")
plot(fit.tvc,newdata=data.frame(hormon=0),type="hdiff",var="hormon")
library(scales)
cols <- c(alpha("red",alpha=0.2), alpha("blue",alpha=0.2))
plot(fit.tvc,newdata=data.frame(hormon=0),type="hazard",ci.col=cols[1])
lines(fit.tvc,newdata=data.frame(hormon=1),type="hazard",lty=2,ci.col=cols[2],
ci=TRUE)
legend("topright",legend=c("No hormonal treatment", "(95
lty=c(1,1,2,1), lwd=c(1,10,1,10), col=c("black",cols[1],"black",cols[2]), bty="n")
## compare number of knots
hormon0 <- data.frame(hormon=0)
plot(fit,type="hazard",newdata=hormon0)
AIC(fit)
for (df in 4:6) {
fit.new <- stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,df=df)
plot(fit.new,type="hazard",newdata=hormon0,add=TRUE,ci=FALSE,line.col=df)
print(AIC(fit.new))
}
## compatibility with Stata's stpm2 using the smooth.formula argument (see Details)
summary(stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,
smooth.formula=~nsx(log(rectime),df=3,stata.stpm2.compatible=TRUE)))
summary(stpm2(Surv(rectime,censrec==1)~hormon,data=brcancer,
smooth.formula=~nsx(log(rectime),df=3,stata=TRUE)+
hormon:nsx(log(rectime),df=3,stata=TRUE)))
## End(Not run)