R: Fit a Joint Model for Longitudinal Data and a Terminal Event

longiPenal {frailtypack}

R Documentation

Fit a Joint Model for Longitudinal Data and a Terminal Event

Description

Fit a joint model for longitudinal data and a terminal event using a semiparametric penalized likelihood estimation or a parametric estimation on the hazard function.

The longitudinal outcomes y_i(t_ik) (k=1,...,n_i, i=1,...,N) for N subjects are described by a linear mixed model and the risk of the terminal event is represented by a proportional hazard risk model. The joint model is constructed assuming that the processes are linked via a latent structure (Wulfsohn and Tsiatsis 1997):

where X_Li(t) and X_Ti are vectors of fixed effects covariates and \beta_L and \beta_T are the associated coefficients. Measurements errors \epsilon_i(t_ik) are iid normally distributed with mean 0 and variance \sigma_\epsilon². The random effects b_i = (b_0i,...,b_qi)^T ~ N(0,B₁) are associated to covariates Z_Li(t) and independent from the measurement error. The relationship between the two processes is explained via h(b_i,\beta_L,Z_Li(t),X_Li(t)) with coefficients \eta_T. Two forms of the function h(.) are available: the random effects b_i and the current biomarker level b_i(t)=X_Li(t_ik)^T \beta_L + Z_i(t_ik)^T b_i.

We consider that the longitudinal outcome can be a subject to a quantification limit, i.e. some observations, below a level of detection s cannot be quantified (left-censoring).

Alternatively, a two-part model is proposed to fit a semicontinuous biomarker. The two-part model decomposes the biomarker's distribution into a binary outcome (zero vs. positive values) and a continuous outcome (positive values). In the conditional form, the continuous part is conditional on a positive value while in the marginal form, the continuous part corresponds to the marginal mean of the biomarker. A logistic mixed effects model fits the binary outcome and a linear or a lognormal mixed effects model fits the continuous outcome.

A mediation analysis is possible to derive the proportion of the treatment effect on the survival outcome due to the treatment effect on the longitudinal outcome. The proportion of treatment effect is derived as the ratio of the indirect effect over the total effect. This proportion is defined on the survival scale and the indirect effect is taken as the difference of survivals at a given time when the treatment is set to 1 for both the survival and longitudinal outcome and when the treatment is only set to 1 for the survival endpoint and 0 for the longitudinal outcome. The total effect is the difference of survivals when the treatment is set to 1 or 0 for both endpoints.

Usage

longiPenal(formula, formula.LongitudinalData, data,  data.Longi,
          formula.Binary=FALSE, random,random.Binary=FALSE, fixed.Binary=FALSE, 
          GLMlog=FALSE, MTP=FALSE, id, intercept = TRUE,link="Random-effects",
          timevar=FALSE,left.censoring=FALSE,n.knots, kappa,maxit=350,
          hazard="Splines",mediation=FALSE,med.center=NULL,med.trt=NULL,init.B,
          init.Random, init.Eta, method.GH = "Standard",seed.MC=1, n.nodes,
          LIMparam=1e-3,LIMlogl=1e-3, LIMderiv=1e-3, print.times=TRUE,med.nmc=500,
          pte.times=NULL,pte.ntimes=NULL,pte.nmc=500,pte.boot=FALSE,pte.nboot=2000)

Arguments

`formula`	a formula object, with the response on the left of a `\sim` operator, and the terms on the right. The response must be a survival object as returned by the 'Surv' function like in survival package. Interactions are possible using * or :.
`formula.LongitudinalData`	a formula object, only requires terms on the right to indicate which variables are modelling the longitudinal outcome. It must follow the standard form used for linear mixed-effects models. Interactions are possible using * or :.
`data`	a 'data.frame' with the variables used in `formula`.
`data.Longi`	a 'data.frame' with the variables used in `formula.LongitudinalData`.
`formula.Binary`	a formula object, only requires terms on the right to indicate which variables are modelling the binary part of the two-part model fitting the longitudinal semicontinuous outcome. It must follow the standard form used for linear mixed-effects models. Interactions are possible using * or :.
`random`	Names of variables for the random effects of the longitudinal outcome. Maximum 3 random effects are possible at the moment. The random intercept is chosen using `"1"`.
`random.Binary`	Names of variables for the random effects of the binary part of the two-part model fitting the longitudinal semicontinuous outcome. The random intercept is chosen using `"1"`.
`fixed.Binary`	Fix the value of the intercept in the binary part of a two-part model.
`GLMlog`	Logical value. Use a lognormal distribution for the biomarker (instead of the default normal distribution).
`MTP`	Logical value. Marginal two-part joint model instead of conditional two-part joint model (only with two-part models).
`id`	Name of the variable representing the individuals.
`intercept`	Logical value. Is the fixed intercept of the biomarker included in the mixed-effects model? The default is `TRUE`.
`link`	Type of link function for the dependence between the biomarker and death: `"Random-effects"` for the association directly via the random effects of the biomarker, `"Current-level"` for the association via the true current level of the biomarker. The option `"Current-level"` can be chosen only if the biomarker random effects are associated with the intercept and time (following this order). `"Two-part"`, this structure is only applicable with conditional two-part models, the effect of the current probability of positive value and the effect of the expected value among positive values on the risk of event is evaluated separately. The default is `"Random-effects"`.
`timevar`	Indicates the time varying variables to take into account this evolution over time in the link with the survival model (useful with 'Current-level' and 'Two-part' links)
`left.censoring`	Is the biomarker left-censored below a threshold `s`? The default is `FALSE`, ie. no left-censoring. In case of a left-censored biomarker, this argument must be equal to the threshold `s`.
`n.knots`	Integer giving the number of knots to use. Value required in the penalized likelihood estimation. It corresponds to the (n.knots+2) splines functions for the approximation of the hazard or the survival functions. We estimate I or M-splines of order 4. When the user set a number of knots equals to k (n.knots=k) then the number of interior knots is (k-2) and the number of splines is (k-2)+order. Number of knots must be between 4 and 20. (See Note in `frailtyPenal` function)
`kappa`	Positive smoothing parameter in the penalized likelihood estimation. The coefficient kappa of the integral of the squared second derivative of hazard function in the fit (penalized log likelihood). To obtain an initial value for `kappa`, a solution is to fit the corresponding Cox model using cross validation (See `cross.validation` in function `frailtyPenal`). We advise the user to identify several possible tuning parameters, note their defaults and look at the sensitivity of the results to varying them.
`maxit`	Maximum number of iterations for the Marquardt algorithm. The default is 350.
`hazard`	Type of hazard functions: `"Splines"` for semiparametric hazard functions using equidistant intervals or `"Splines-per"` using percentile with the penalized likelihood estimation, `"Weibull"` for the parametric Weibull functions. The default is `"Splines"`.
`mediation`	a logical value indicating if the mediation analysis method is used. Default is FALSE.
`med.center`	For mediation analysis, a vector containing the center indicator for each subject If no center then this argument should be `NULL`. Default is `NULL`.
`med.trt`	For mediation analysis, a vector containing the treatment indicator for each subject.
`init.B`	Vector of initial values for regression coefficients. This vector should be of the same size as the whole vector of covariates with the first elements for the covariates related to the terminal event and then for the covariates related to the biomarker (interactions in the end of each component). Default is 0.5 for each.
`init.Random`	Initial value for variance of the elements of the matrix of the distribution of the random effects. Default is 0.5 for each element.
`init.Eta`	Initial values for regression coefficients for the link function. Default is 0.5 for each.
`method.GH`	Method for the Gauss-Hermite quadrature: `"Standard"` for the standard non-adaptive Gaussian quadrature, `"Pseudo-adaptive"` for the pseudo-adaptive Gaussian quadrature, `"Monte-carlo"` for the Monte-carlo method and `"HRMSYM"` for the algorithm for the multivariate non-adaptive Gaussian quadrature (see Details). The default is `"Standard"`.
`seed.MC`	Monte-carlo integration points selection (1=fixed, 0=random)
`n.nodes`	Number of nodes for the Gauss-Hermite quadrature or the Monte-carlo method. They can be chosen among 5, 7, 9, 12, 15, 20 and 32 for the GH quadrature and any number for the Monte-carlo method. The default is 9.
`LIMparam`	Convergence threshold of the Marquardt algorithm for the parameters (see Details of `frailtyPenal` function), `10^{-3}` by default.
`LIMlogl`	Convergence threshold of the Marquardt algorithm for the log-likelihood (see Details of `frailtyPenal` function), `10^{-3}` by default.
`LIMderiv`	Convergence threshold of the Marquardt algorithm for the gradient (see Details of `frailtyPenal` function), `10^{-3}` by default.
`print.times`	a logical parameter to print iteration process. The default is TRUE.
`med.nmc`	For mediation analysis, the number of Monte Carlo points used for computing the integral over the random effects in the likelihood computation. Default is 500.
`pte.times`	For mediation analysis, a vector of times for which the funtion `PTE(t)` is evaluated. Specified time points must be in the range of the observed event times. The length of the vector should be less than 200.
`pte.ntimes`	For mediation analysis, if the argument `pte.times` is not specified the argument `pte.ntimes` allows the user to only specify a number of time points for which the function `PTE(t)` has to be computed. This argument is only to be used if `pte.times` is not specified. In that case the default value for `pte.ntimes` is 10. Should be less than 200.
`pte.nmc`	For mediation analysis, nn integer indicating how many Monte Carlo simulations are used to integrate over the random effects in the computation of the function `PTE(t)`. Should be less than 10000. Default is 500.
`pte.boot`	For mediation analysis, a logical value indicating if bootstrapped confidence bands needs to be computed for the function `PTE(t)` in the mediation analysis setting. Default is FALSE.
`pte.nboot`	For mediation analysis, an integer indicating how many bootstrapped replicates of PTE(t) needs to be computed to derive confidence bands for PTE(t). Should be less than 10000. Default is 2000.

Details

Typical usage for the joint model

longiPenal(Surv(time,event)~var1+var2, biomarker ~ var1+var2,
data, data.Longi, ...)

The method of the Gauss-Hermite quadrature for approximations of the multidimensional integrals, i.e. length of random is 2, can be chosen among the standard, non-adaptive, pseudo-adaptive in which the quadrature points are transformed using the information from the fitted mixed-effects model for the biomarker (Rizopoulos 2012) or multivariate non-adaptive procedure proposed by Genz et al. 1996 and implemented in FORTRAN subroutine HRMSYM. The choice of the method is important for estimations. The standard non-adaptive Gauss-Hermite quadrature ("Standard") with a specific number of points gives accurate results but can be time consuming. The non-adaptive procedure ("HRMSYM") offers advantageous computational time but in case of datasets in which some individuals have few repeated observations (biomarker measures or recurrent events), this method may be moderately unstable. The pseudo-adaptive quadrature uses transformed quadrature points to center and scale the integrand by utilizing estimates of the random effects from an appropriate linear mixed-effects model. This method enables using less quadrature points while preserving the estimation accuracy and thus lead to a better computational time.The Monte-Carlo method is also proposed for approximations of the multidimensional integrals.

NOTE. Data frames data and data.Longi must be consistent. Names and types of corresponding covariates must be the same, as well as the number and identification of individuals.

Value

The following components are included in a 'longiPenal' object for each model:

`b`	The sequence of the corresponding estimation of the coefficients for the hazard functions (parametric or semiparametric), the random effects variances and the regression coefficients.
`call`	The code used for the model.
`formula`	The formula part of the code used for the terminal event part of the model.
`formula.LongitudinalData`	The formula part of the code used for the longitudinal part of the model.
`formula.Binary`	The formula part of the code used for the binary part of the two-part model.
`coef`	The regression coefficients (first for the terminal event and then for the biomarker.
`groups`	The number of groups used in the fit.
`kappa`	The value of the smoothing parameter in the penalized likelihood estimation corresponding to the baseline hazard function for the terminal event.
`logLikPenal`	The complete marginal penalized log-likelihood in the semiparametric case.
`logLik`	The marginal log-likelihood in the parametric case.
`n.measurements`	The number of biomarker observations used in the fit.
`max_rep`	The maximal number of repeated measurements per individual.
`n.deaths`	The number of events observed in the fit.
`n.iter`	The number of iterations needed to converge.
`n.knots`	The number of knots for estimating the baseline hazard function in the penalized likelihood estimation.
`n.strat`	The number of stratum.
`varH`	The variance matrix of all parameters (before positivity constraint transformation for the variance of the measurement error, for which the delta method is used).
`varHIH`	The robust estimation of the variance matrix of all parameters.
`xD`	The vector of times where both survival and hazard function of the terminal event are estimated. By default seq(0,max(time),length=99), where time is the vector of survival times.
`lamD`	The array (dim=3) of baseline hazard estimates and confidence bands (terminal event).
`survD`	The array (dim=3) of baseline survival estimates and confidence bands (terminal event).
`median`	The value of the median survival and its confidence bands.
`typeof`	The type of the baseline hazard functions (0:"Splines", "2:Weibull").
`npar`	The number of parameters.
`nvar`	The vector of number of explanatory variables for the terminal event and biomarker.
`nvarEnd`	The number of explanatory variables for the terminal event.
`nvarY`	The number of explanatory variables for the biomarker.
`noVarEnd`	The indicator of absence of the explanatory variables for the terminal event.
`noVarY`	The indicator of absence of the explanatory variables for the biomarker.
`LCV`	The approximated likelihood cross-validation criterion in the semiparametric case (with H minus the converged Hessian matrix, and l(.) the full log-likelihood). `LCV=\frac{1}{n}(trace(H^{-1}_{pl}H) - l(.))`
`AIC`	The Akaike information Criterion for the parametric case. `AIC=\frac{1}{n}(np - l(.))`
`n.knots.temp`	The initial value for the number of knots.
`shape.weib`	The shape parameter for the Weibull hazard function.
`scale.weib`	The scale parameter for the Weibull hazard function.
`martingaledeath.res`	The martingale residuals for each individual.
`conditional.res`	The conditional residuals for the biomarker (subject-specific): `\bold{R}_i^{(m)}=\bold{y}_i-\bold{X}_{Li}^\top\widehat{\bold{\beta}}_L-\bold{Z}_i^\top\widehat{\bold{b}}_i`.
`marginal.res`	The marginal residuals for the biomarker (population averaged): `\bold{R}_i^{(c)}=\bold{y}_i-\bold{X}_{Li}^\top\widehat{\bold{\beta}}_L`.
`marginal_chol.res`	The Cholesky marginal residuals for the biomarker: `\bold{R}_i^{(m)}=\widehat{\bold{U}_i^{(m)}}\bold{R}_i^{(m)}`, where `\widehat{\bold{U}_i^{(m)}}` is an upper-triangular matrix obtained by the Cholesky decomposition of the variance matrix `\bold{V}_{\bold{R}_i^{(m)}}=\widehat{\bold{V}_i}-\bold{X}_{Li}(\sum_{i=1}^N\bold{X}_{Li}\widehat{\bold{V}_i}^{-1}\bold{X}_{Li})^{-1}\bold{X}_{Li}^\top`.
`conditional_st.res`	The standardized conditional residuals for the biomarker.
`marginal_st.res`	The standardized marginal residuals for the biomarker.
`random.effects.pred`	The empirical Bayes predictions of the random effects (ie. using conditional posterior distributions).
`pred.y.marg`	The marginal predictions of the longitudinal outcome.
`pred.y.cond`	The conditional (given the random effects) predictions of the longitudinal outcome.
`lineardeath.pred`	The linear predictor for the terminal part.
`global_chisq_d`	The vector with values of each multivariate Wald test for the terminal part.
`dof_chisq_d`	The vector with degrees of freedom for each multivariate Wald test for the terminal part.
`global_chisq.test_d`	The binary variable equals to 0 when no multivariate Wald is given, 1 otherwise (for the terminal part).
`p.global_chisq_d`	The vector with the p_values for each global multivariate Wald test for the terminal part.
`global_chisq`	The vector with values of each multivariate Wald test for the longitudinal part.
`dof_chisq`	The vector with degrees of freedom for each multivariate Wald test for the longitudinal part.
`global_chisq.test`	The binary variable equals to 0 when no multivariate Wald is given, 1 otherwise (for the longitudinal part).
`p.global_chisq`	The vector with the p_values for each global multivariate Wald test for the longitudinal part.
`names.factordc`	The names of the "as.factor" variables for the terminal part.
`names.factor`	The names of the "as.factor" variables for the longitudinal part.
`intercept`	The logical value. Is the fixed intercept included in the linear mixed-effects model?
`B1`	The variance matrix of the random effects for the longitudinal outcome.
`ResidualSE`	The standard deviation of the measurement error.
`eta`	The regression coefficients for the link function.
`ne_re`	The number of random effects used in the fit.
`names.re`	The names of variables for the random effects.
`link`	The name of the type of the link function.
`eta_p.value`	p-values of the Wald test for the estimated regression coefficients for the link function.
`beta_p.value`	p-values of the Wald test for the estimated regression coefficients.
`leftCensoring`	The logical value. Is the longitudinal outcome left-censored?
`leftCensoring.threshold`	For the left-censored biomarker, the value of the left-censoring threshold used for the fit.
`prop.censored`	The fraction of observations subjected to the left-censoring.
`methodGH`	The method used for approximations of the multidimensional integrals.
`n.nodes`	The number of integration points.

References

A. Krol, A. Mauguen, Y. Mazroui, A. Laurent, S. Michiels and V. Rondeau (2017). Tutorial in Joint Modeling and Prediction: A Statistical Software for Correlated Longitudinal Outcomes, Recurrent Events and a Terminal Event. Journal of Statistical Software 81(3), 1-52.

A. Krol, L. Ferrer, JP. Pignon, C. Proust-Lima, M. Ducreux, O. Bouche, S. Michiels, V. Rondeau (2016). Joint Model for Left-Censored Longitudinal Data, Recurrent Events and Terminal Event: Predictive Abilities of Tumor Burden for Cancer Evolution with Application to the FFCD 2000-05 Trial. Biometrics 72(3) 907-16.

D. Rizopoulos (2012). Fast fitting of joint models for longitudinal and event time data using a pseudo-adaptive Gaussian quadrature rule. Computational Statistics and Data Analysis 56, 491-501.

M.S. Wulfsohn, A.A. and Tsiatis, A. A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics 53, 330-9.

A. Genz and B. Keister (1996). Fully symmetric interpolatory rules for multiple integrals over infinite regions with Gaussian weight. Journal of Computational and Applied Mathematics 71, 299-309.

D. Rustand, L. Briollais, C. Tournigand and V. Rondeau (2020). Two-part joint model for a longitudinal semicontinuous marker and a terminal event with application to metastatic colorectal cancer data. Biostatistics.

Examples



## Not run: 

###--- Joint model for longitudinal data and a terminal event ---###

data(colorectal)
data(colorectalLongi)

# Survival data preparation - only terminal events
colorectalSurv <- subset(colorectal, new.lesions == 0)

# Baseline hazard function approximated with splines
# Random effects as the link function

model.spli.RE <- longiPenal(Surv(time1, state) ~ age + treatment + who.PS
+ prev.resection, tumor.size ~  year * treatment + age + who.PS ,
data=colorectalSurv,	data.Longi = colorectalLongi, random = c("1", "year"),
id = "id", link = "Random-effects", left.censoring = -3.33,
n.knots = 7, kappa = 2)

# Weibull baseline hazard function
# Current level of the biomarker as the link function

model.weib.CL <- longiPenal(Surv(time1, state) ~ age + treatment + who.PS
+ prev.resection, tumor.size ~  year * treatment + age + who.PS , timevar="year",
data=colorectalSurv, data.Longi = colorectalLongi, random = c("1", "year"),
id = "id", link = "Current-level", left.censoring = -3.33, hazard = "Weibull")


###--- Two-part Joint model for semicontinuous
#      longitudinal data and a terminal event ---###

data(colorectal)
data(colorectalLongi)
colorectalSurv <- subset(colorectal, new.lesions == 0)

# Box-cox back transformation (lambda=0.3) and apply logarithm (with a 1 unit shift)
colorectalLongi$Yo <- (colorectalLongi$tumor.size*0.3+1)^(1/0.3)
colorectalLongi$Y <- log(colorectalLongi$Y+1) # log transformation with shift=1

# Conditional two-part joint model - random-effects association structure (~15min)

CTPJM_re <-longiPenal(Surv(time1, state)~age + treatment +
who.PS+ prev.resection, Y~year*treatment, formula.Binary=Y~year*treatment,
data = colorectalSurv, data.Longi = colorectalLongi, random = c("1"),
random.Binary=c("1"), id = "id", link ="Random-effects", left.censoring = F,
n.knots = 7, kappa = 2, hazard="Splines-per")

print(CTPJM_re)

# Conditional two-part joint model - current-level association structure (~15min)
# Simulated dataset (github.com/DenisRustand/TPJM_sim)
data(longDat)
data(survDat)
tte <- frailtyPenal(Surv(deathTimes, d)~trt,n.knots=5,kappa=0, data=survDat,cross.validation = T)
kap <- round(tte$kappa,2);kap # smoothing parameter
  CTPJM_cl <- longiPenal(Surv(deathTimes, d)~trt, Y~timej*trtY,
  data=survDat, data.Longi = longDat,
  random = c("1","timej"), formula.Binary=Y~timej*trtY,
  random.Binary=c("1"), timevar="timej", id = "id",
  link = "Current-level", n.knots = 5, kappa = kap,
  hazard="Splines-per", method.GH="Monte-carlo",
  n.nodes=500)

  print(CTPJM_cl)


# Marginal two-part joint model - random-effects association structure (~10min)
longDat$Yex <- exp(longDat$Y)-1
MTPJM_re <- longiPenal(Surv(deathTimes, d)~trt, Yex~timej*trtY,
                  data=survDat, data.Longi = longDat,MTP=T,GLMlog = T,
                  random = c("1"), formula.Binary=Y~timej*trtY,
                  random.Binary=c("1"), timevar="timej", id = "id",
                  link = "Random-effects", n.knots = 5, kappa = kap,
                  hazard="Splines-per", method.GH="Monte-carlo",
                  n.nodes=500)

print(MTPJM_re)

# Marginal two-part joint model - current-level association structure (~45min)
 MTPJM_cl <- longiPenal(Surv(deathTimes, d)~trt, Yex~timej*trtY,
                  data=survDat, data.Longi = longDat,MTP=T,GLMlog = T,
                  random = c("1","timej"), formula.Binary=Y~timej*trtY,
                  random.Binary=c("1"), timevar="timej", id = "id",
                  link = "Current-level", n.knots = 5, kappa = kap,
                  hazard="Splines-per", method.GH="Monte-carlo",
                  n.nodes=500)

print(MTPJM_cl)

###--- Mediation analysis  
#Takes ~ 10 minutes to run
data(colorectal)
data(colorectalLongi)
colorectalSurv <- subset(colorectal, new.lesions == 0)

colorectalSurv$treatment<-sapply(colorectalSurv$treatment,function(t) ifelse(t=="S",1,0))
colorectalLongi$treatment<-sapply(colorectalLongi$treatment,function(t) ifelse(t=="S",1,0))

mod.col=longiPenal(Surv(time1, state) ~ age+treatment, 
  	       tumor.size ~ age+year*treatment,
           data=colorectalSurv,	data.Longi = colorectalLongi, random = c("1", "year"),
           id = "id", link = "Current-level",timevar="year",method.GH = "Pseudo-adaptive",
           mediation = TRUE,med.trt = colorectalSurv$treatment,
           med.center = NULL,med.nmc = 50,n.knots = 7, kappa = 2,
  	        pte.ntimes = 30,pte.boot = T,pte.nmc = 1000,pte.nboot = 1000)

print(mod.col)
plot(mod.col,plot.mediation='All')

## End(Not run)

[Package frailtypack version 3.6.2 Index]