pointEstimate {riskCommunicator} | R Documentation |
Perform g-computation to estimate difference and ratio effects of an exposure
Description
Generate a point estimate of the outcome difference and ratio using G-computation
Usage
pointEstimate(
data,
outcome.type = c("binary", "count", "count_nb", "rate", "rate_nb", "continuous"),
formula = NULL,
Y = NULL,
X = NULL,
Z = NULL,
subgroup = NULL,
offset = NULL,
rate.multiplier = 1,
exposure.scalar = 1,
exposure.center = TRUE
)
Arguments
data |
(Required) A data.frame containing variables for
|
outcome.type |
(Required) Character argument to describe the outcome
type. Acceptable responses, and the corresponding error distribution and
link function used in the
|
formula |
(Optional) Default NULL. An object of class "formula" (or one that can be coerced to that class) which provides the the complete model formula, similar to the formula for the glm function in R (e.g. 'Y ~ X + Z1 + Z2 + Z3'). Can be supplied as a character or formula object. If no formula is provided, Y and X must be provided. |
Y |
(Optional) Default NULL. Character argument which specifies the
outcome variable. Can optionally provide a formula instead of |
X |
(Optional) Default NULL. Character argument which specifies the
exposure variable (or treatment group assignment), which can be binary,
categorical, or continuous. This variable can be supplied as a factor
variable (for binary or categorical exposures) or a continuous variable.
For binary/categorical exposures, |
Z |
(Optional) Default NULL. List or single character vector which
specifies the names of covariates or other variables to adjust for in the
|
subgroup |
(Optional) Default NULL. Character argument that indicates subgroups for stratified analysis. Effects will be reported for each category of the subgroup variable. Variable will be automatically converted to a factor if not already. |
offset |
(Optional, only applicable for rate/count outcomes) Default NULL. Character argument which specifies the variable name to be used as the person-time denominator for rate outcomes to be included as an offset in the Poisson regression model. Numeric variable should be on the linear scale; function will take natural log before including in the model. |
rate.multiplier |
(Optional, only applicable for rate/count outcomes). Default 1. Numeric variable signifying the person-time value to use in predictions; the offset variable will be set to this when predicting under the counterfactual conditions. This value should be set to the person-time denominator desired for the rate difference measure and must be inputted in the units of the original offset variable (e.g. if the offset variable is in days and the desired rate difference is the rate per 100 person-years, rate.multiplier should be inputted as 365.25*100). |
exposure.scalar |
(Optional, only applicable for continuous exposure) Default 1. Numeric value to scale effects with a continuous exposure. This option facilitates reporting effects for an interpretable contrast (i.e. magnitude of difference) within the continuous exposure. For example, if the continuous exposure is age in years, a multiplier of 10 would result in estimates per 10-year increase in age rather than per a 1-year increase in age. |
exposure.center |
(Optional, only applicable for continuous exposure) Default TRUE. Logical or numeric value to center a continuous exposure. This option facilitates reporting effects at the mean value of the exposure variable, and allows for a mean value to be provided directly to the function in cases where bootstrap resampling is being conducted and a standardized centering value should be used across all bootstraps. See note below on continuous exposure variables for additional details. |
Details
The pointEstimate
function executes the following steps on
the data:
Fit a regression of the outcome on the exposure and relevant covariates, using the provided data set.
Using the model fit in step 1, predict counterfactuals (e.g. calculate predicted outcomes for each observation in the data set under each level of the treatment/exposure).
Estimate the marginal difference/ratio of treatment effect by taking the difference or ratio of the average of all observations under the treatment/no treatment regimes.
As counterfactual predictions are generated with random sampling of the
distribution, users should set a seed (set.seed
) prior to
calling the function for reproducible confidence intervals.
Value
A named list containing the following:
$parameter.estimates |
Point estimates for the risk difference, risk ratio, odds ratio, incidence rate difference, incidence rate ratio, mean difference and/or number needed to treat/harm, depending on the outcome.type |
$formula |
Model formula used to fit the |
$contrast |
Contrast levels compared |
$Y |
The response variable |
$covariates |
Covariates used in the model |
$n |
Number of observations provided to the model |
$family |
Error distribution used in the model |
$predicted.data |
A data.frame with the predicted values for the exposed and unexposed counterfactual predictions for each observation in the original dataset (on the log scale) |
$predicted.outcome |
A data.frame with the marginal mean predicted outcomes for each exposure level |
$glm.result |
The |
formula = formula,
Note
While offsets are used to account for differences in follow-up time
between individuals in the glm
model, rate differences are
calculated assuming equivalent follow-up of all individuals (i.e.
predictions for each exposure are based on all observations having the
same offset value). The default is 1 (specifying 1 unit of the original
offset variable) or the user can specify an offset to be used in the
predictions with the rate.multiplier argument.
Note that for a protective exposure (risk difference less than 0), the 'Number needed to treat/harm' is interpreted as the number needed to treat, and for a harmful exposure (risk difference greater than 0), it is interpreted as the number needed to harm.
For continuous exposure variables, the default effects are provided for a one unit difference in the exposure at the mean value of the exposure variable. Because the underlying parametric model for a binary outcome is logistic regression, the risks for a continuous exposure will be estimated to be linear on the log-odds (logit) scale, such that the odds ratio for any one unit increase in the continuous variable is constant. However, the risks will not be linear on the linear (risk difference) or log (risk ratio) scales, such that these parameters will not be constant across the range of the continuous exposure. Users should be aware that the risk difference, risk ratio, number needed to treat/harm (for a binary outcome) and the incidence rate difference (for a rate/count outcome) reported with a continuous exposure apply specifically at the mean of the continuous exposure. The effects do not necessarily apply across the entire range of the variable. However, variations in the effect are likely small, especially near the mean.
@note
Interaction terms are not allowed in the model formula. The subgroup
argument affords interaction between the exposure variable and a single
covariate (that is forced to categorical if supplied as numeric) to
estimate effects of the exposure within subgroups defined by the
interacting covariate. To include additional interaction terms with
variables other than the exposure, we recommend that users create the
interaction term as a cross-product of the two interaction variables in
a data cleaning step prior to running the model.
@note
For negative binomial models, MASS::glm.nb
is used instead of the
standard stats::glm
function used for all other models.
References
Ahern J, Hubbard A, Galea S. Estimating the effects of potential public health interventions on population disease burden: a step-by-step illustration of causal inference methods. Am. J. Epidemiol. 2009;169(9):1140–1147. doi:10.1093/aje/kwp015
Altman DG, Deeks JJ, Sackett DL. Odds ratios should be avoided when events are common. BMJ. 1998;317(7168):1318. doi:10.1136/bmj.317.7168.1318
Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC. Book link
Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7(9):1393–1512. doi:10.1016/0270-0255(86)90088-6
Snowden JM, Rose S, Mortimer KM. Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. Am. J. Epidemiol. 2011;173(7):731–738. doi:10.1093/aje/kwq472
Westreich D, Cole SR, Young JG, et al. The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death. Stat Med. 2012;31(18):2000–2009. doi:10.1002/sim.5316
See Also
Examples
## Obtain the risk difference and risk ratio for cardiovascular disease or death
## between patients with and without diabetes, while controlling for
## age,
## sex,
## BMI,
## whether the individual is currently a smoker, and
## if they have a history of hypertension.
data(cvdd)
ptEstimate <- pointEstimate(data = cvdd, Y = "cvd_dth", X = "DIABETES",
Z = c("AGE", "SEX", "BMI", "CURSMOKE", "PREVHYP"), outcome.type = "binary")