gformula {CICI}R Documentation

Parametric g-formula for continuous multiple time point interventions

Description

Estimation of counterfactual outcomes for multiple values of continuous interventions at different time points using the g-formula.

Usage

gformula(X, Anodes, Ynodes, Lnodes = NULL, Cnodes = NULL,
         abar = NULL, cbar = "uncensored",
         survivalY = FALSE, 
         Yform = "GLM", Lform = "GLM", Aform = "GLM", Cform = "GLM",
         calc.support = FALSE, B = 0, ret = FALSE, ncores = 1, 
         verbose = TRUE, seed = NULL, prog = NULL, ...)

Arguments

X

A data frame, following the time-ordering of the nodes. Categorical variables with k categories should be a factor, with levels 0,...,k-1. Binary variables should be coded 0/1.

Anodes

A character string of column names in X of the intervention variable(s).

Ynodes

A character string of column names in X of the outcome variable(s).

Lnodes

A character string of column names in X of the time-dependent (post first treatment) variable(s).

Cnodes

A character string of column names in X of the censoring variable(s).

abar

Numeric vector or matrix of intervention values, or the string "natural". See Details.

cbar

Typically either the string "uncensored" or "natural", but a numeric vector or matrix of censoring values is not forbidden. See Details.

survivalY

Logical. If TRUE, then Y nodes are indicators of an event, and if Y at some time point is 1, then all following should be 1.

Yform

A string of either "GLM", "GAM" or of length 'number of Ynodes' with model formulas. See Details.

Lform

A string of either "GLM", "GAM" or of length 'number of Lnodes' with model formulas. See Details.

Aform

A string of either "GLM", "GAM" or of length 'number of Anodes' with model formulas. See Details.

Cform

A string of either "GLM", "GAM" or of length 'number of Cnodes' with model formulas. See Details.

calc.support

Logical. If TRUE, both crude and conditional support is estimated.

B

An integer specifying the number of bootstrap samples to be used, if any.

ret

Logical. If TRUE, the simulated post-intervention data is returned.

ncores

An integer for the number of threads/cores to be used. If >1, parallelization will be utilized.

verbose

Logical. If TRUE, notes and warnings are printed.

seed

An integer specifying the seed to be used to create reproducable results for parallel computing (i.e. when ncores>1).

prog

A character specifying a path where progress should be saved (typically, when ncores>1)

...

Further arguments to be passed on.

Details

By default, expected counterfactual outcomes (specified under Ynodes) under the intervention abar are calculated. Other estimands can be specified via custom.measure.

If abar is a vector, then each vector component is used as the intervention value at each time point; that is, interventions which are constant over time are defined. If abar is a matrix (of size 'number interventions' x 'time points'), then each row of the length of Anodes refers to a particular time-varying intervention strategy. The natural intervention can be picked by setting abar='natural'.

The fitted outcome and confounder models are based on generalized additive models (GAMs) as implemented in the mgcv package. Model families are picked automatically and reported in the output if verbose=TRUE (see manual for modifications, though they hardly ever make sense). The model formulas are standard GLMs or GAMs (with penalized splines for continuous covariates), conditional on the past, unless specific formulae are given. It is recommended to use customized formulae to reduce the risk of model mis-specification and to ensure that the models make sense (e.g., not too many splines are used when this is computationally not meaningful). This can be best facilitated by using objects generated through make.model.formulas, followed by model.formulas.update and/or model.update (see examples for those functions).

For survival settings, it is required that i) survivalY=TRUE, ii) the data are in a format where a Ynode stays 1, after it jumps to 1 and ii) after a Cnode/Ynode is 1, every variable thereafter is set to NA (except a Ynode which is already 1). See manual for an example. By default, the package intervenes on Cnodes, i.e. calculates counterfactual outcomes under no censoring.

If calc.support=TRUE, conditional and crude support measures (i.e., diagnostics) are calculated as described in Section 3.3.2 of Schomaker et al. (2023). Another useful diagnostic for multiple time points is the natural course scenario, which can be evaluated under abar='natural' and cbar='natural'.

To parallelize computations automatically, it is sufficient to set ncores>1, as appropriate. No further customization or setup is needed, everything will be done by the package. To make estimates under parallelization reproducible, use the seed argument. To watch the progress of parallelized computations, set a path in the prog argument: then, a text file reports on the progress, which is particularly useful if lengthy bootstrapping computations are required.

Value

Returns an object of of class ‘gformula’:

results

matrix of results

diagnostics

list of diagnostics and weights based on the estimated support (if calc.support=TRUE)

simulated.data

list of counterfactual data sets related to the interventions defined through option abar (and cbar). Will be NULL is ret=FALSE.

observed.data

list of observed data (and bootstrapped observed data). Will be NULL is ret=FALSE.

setup

list of chosen setup parameters

Author(s)

Michael Schomaker

See Also

plot.gformula for plotting results as (causal) dose response curves, custom.measure for evaluating custom estimands and mi.boot for using gformula on multiply imputed data.

Examples


data(EFV)
est <- gformula(X=EFV,
                      Lnodes  = c("adherence.1","weight.1",
                                  "adherence.2","weight.2",
                                  "adherence.3","weight.3",
                                  "adherence.4","weight.4"
                        ),
                        Ynodes  = c("VL.0","VL.1","VL.2","VL.3","VL.4"),
                        Anodes  = c("efv.0","efv.1","efv.2","efv.3","efv.4"),
                        abar=seq(0,10,1)
)
est



[Package CICI version 0.9.1 Index]