gformula {CICI} | R Documentation |
Parametric g-formula for continuous multiple time point interventions
Description
Estimation of counterfactual outcomes for multiple values of continuous interventions at different time points using the g-formula.
Usage
gformula(X, Anodes, Ynodes, Lnodes = NULL, Cnodes = NULL,
abar = NULL, cbar = "uncensored",
survivalY = FALSE,
Yform = "GLM", Lform = "GLM", Aform = "GLM", Cform = "GLM",
calc.support = FALSE, B = 0, ret = FALSE, ncores = 1,
verbose = TRUE, seed = NULL, prog = NULL, ...)
Arguments
X |
A data frame, following the time-ordering of the nodes. Categorical variables with k categories should be a factor, with levels 0,...,k-1. Binary variables should be coded 0/1. |
Anodes |
A character string of column names in |
Ynodes |
A character string of column names in |
Lnodes |
A character string of column names in |
Cnodes |
A character string of column names in |
abar |
Numeric vector or matrix of intervention values, or the string "natural". See Details. |
cbar |
Typically either the string "uncensored" or "natural", but a numeric vector or matrix of censoring values is not forbidden. See Details. |
survivalY |
Logical. If TRUE, then Y nodes are indicators of an event, and if Y at some time point is 1, then all following should be 1. |
Yform |
A string of either "GLM", "GAM" or of length 'number of Ynodes' with model formulas. See Details. |
Lform |
A string of either "GLM", "GAM" or of length 'number of Lnodes' with model formulas. See Details. |
Aform |
A string of either "GLM", "GAM" or of length 'number of Anodes' with model formulas. See Details. |
Cform |
A string of either "GLM", "GAM" or of length 'number of Cnodes' with model formulas. See Details. |
calc.support |
Logical. If |
B |
An integer specifying the number of bootstrap samples to be used, if any. |
ret |
Logical. If |
ncores |
An integer for the number of threads/cores to be used. If >1, parallelization will be utilized. |
verbose |
Logical. If |
seed |
An integer specifying the seed to be used to create reproducable results for parallel computing (i.e. when ncores>1). |
prog |
A character specifying a path where progress should be saved (typically, when |
... |
Further arguments to be passed on. |
Details
By default, expected counterfactual outcomes (specified under Ynodes
) under the intervention abar
are calculated. Other estimands can be specified via custom.measure
.
If abar
is a vector, then each vector component is used as the intervention value at each time point; that is, interventions which are constant over time are defined. If abar
is a matrix (of size 'number interventions' x 'time points'), then each row of the length of Anodes
refers to a particular time-varying intervention strategy. The natural intervention can be picked by setting abar='natural'
.
The fitted outcome and confounder models are based on generalized additive models (GAMs) as implemented in the mgcv
package. Model families are picked automatically and reported in the output if verbose=TRUE
(see manual for modifications, though they hardly ever make sense). The model formulas are standard GLMs or GAMs (with penalized splines for continuous covariates), conditional on the past, unless specific formulae are given. It is recommended to use customized formulae to reduce the risk of model mis-specification and to ensure that the models make sense (e.g., not too many splines are used when this is computationally not meaningful). This can be best facilitated by using objects generated through make.model.formulas
, followed by model.formulas.update
and/or model.update
(see examples for those functions).
For survival settings, it is required that i) survivalY=TRUE
, ii) the data are in a format where a Ynode stays 1, after it jumps to 1 and ii) after a Cnode/Ynode is 1, every variable thereafter is set to NA
(except a Ynode which is already 1). See manual for an example. By default, the package intervenes on Cnodes, i.e. calculates counterfactual outcomes under no censoring.
If calc.support=TRUE
, conditional and crude support measures (i.e., diagnostics) are calculated as described in Section 3.3.2 of Schomaker et al. (2023). Another useful diagnostic for multiple time points is the natural course scenario, which can be evaluated under abar='natural'
and cbar='natural'
.
To parallelize computations automatically, it is sufficient to set ncores>1
, as appropriate. No further customization or setup is needed, everything will be done by the package. To make estimates under parallelization reproducible, use the seed
argument. To watch the progress of parallelized computations, set a path in the prog
argument: then, a text file reports on the progress, which is particularly useful if lengthy bootstrapping computations are required.
Value
Returns an object of of class
‘gformula’:
results |
matrix of results |
diagnostics |
list of diagnostics and weights based on the estimated support (if |
simulated.data |
list of counterfactual data sets related to the interventions defined through option |
observed.data |
list of observed data (and bootstrapped observed data). Will be |
setup |
list of chosen setup parameters |
Author(s)
Michael Schomaker
See Also
plot.gformula
for plotting results as (causal) dose response curves, custom.measure
for evaluating custom estimands and mi.boot
for using gformula
on multiply imputed data.
Examples
data(EFV)
est <- gformula(X=EFV,
Lnodes = c("adherence.1","weight.1",
"adherence.2","weight.2",
"adherence.3","weight.3",
"adherence.4","weight.4"
),
Ynodes = c("VL.0","VL.1","VL.2","VL.3","VL.4"),
Anodes = c("efv.0","efv.1","efv.2","efv.3","efv.4"),
abar=seq(0,10,1)
)
est