CovariateBalance {EpiForsk} | R Documentation |
Plots for checking covariate balance in causal forest
Description
Generate plots showing balance in the covariates before and after propensity score weighting with a causal forest object.
Usage
CovariateBalance(
cf,
plots = c("all", "Love", "density", "ecdf"),
balance_table = TRUE,
covariates = NULL,
names = NULL,
factor = NULL,
treatment_name = "W",
love_breaks = NULL,
love_xlim = NULL,
love_scale_color = NULL,
cd_nrow = NULL,
cd_ncol = NULL,
cd_x_scale_width = NULL,
cd_bar_width = NULL,
cd_scale_fill = NULL,
ec_nrow = NULL,
ec_ncol = NULL,
ec_x_scale_width = NULL,
ec_scale_color = NULL
)
Arguments
cf |
An object of class causal_forest (and inheriting from class grf). |
plots |
Character, |
balance_table |
Boolean, TRUE to return a table with balance statistics. |
covariates |
A vector to select covariates to show in balance plots. If
|
names |
A named character vector. The vector itself should contain
covariate names from the causal_forest object, while the names attribute
should contain the names to use when plotting. If discrete covariates have
been one-hot encoded using DiscreteCovariatesToOneHot,
providing just the name of a discrete covariate will modify the name of all
levels for plotting. If the vector is unnamed, the provided vector will act
as the new covariate names, given in the order of |
factor |
A named list with covariates to be converted to factor. Note that one-hot encoded covariates are automatically converted, so need not be specified in the factor argument. Each component of the list must contain the factor levels, using a named vector to supply custom labels. |
treatment_name |
Character, name of treatment. |
love_breaks |
Numeric, breaks used in the plot of absolute standardized mean differences. |
love_xlim |
Numeric, |
love_scale_color |
Function, |
cd_nrow , cd_ncol |
Numeric, the dimensions of the grid to create in
covariate distribution plots. If both are |
cd_x_scale_width |
Numeric, the distance between major |
cd_bar_width |
Numeric, the width of the bars in the covariate
distribution plots (barplots for categorical variables, histograms for
continuous variables). If |
cd_scale_fill |
Function, |
ec_nrow , ec_ncol |
Numeric, the dimensions of the grid to create in
empirical CDF plots. If both are |
ec_x_scale_width |
Numeric, the distance between major |
ec_scale_color |
Function, |
Details
If an unnamed character vector is provided in names
, it must have length
ncol(cf$X.orig)
. Names of covarates not selected by covariates
can be set
to NA
. If a named character vector is provided in names
, all renamed
covariates will be kept regardless if they are selected in covariates
.
Thus to select only renamed covariates, character(0)
can be used in
covariates
. The plot theme can be adjusted using ggplot2 active theme
modifiers, see theme_get.
Value
A list with up to five elements:
love_data: data used to plot the absolute standardized mean differences.
love: plot object for absolute standardized mean differences.
cd_data: data used to plot covariate distributions.
cd_unadjusted: plot of unadjusted covariate distributions in the exposure groups.
cd_adjusted: plot of adjusted covariate distributions in the exposure groups.
Author(s)
KIJA
Examples
n <- 1000
p <- 5
X <- matrix(rnorm(n * p), n, p) |>
as.data.frame() |>
dplyr::bind_cols(
DiscreteCovariatesToOneHot(
dplyr::tibble(
D1 = factor(
sample(1:3, n, replace = TRUE, prob = c(0.2, 0.3, 0.5)),
labels = c("first", "second", "third")
),
D2 = factor(
sample(1:2, n, replace = TRUE, prob = c(0.2, 0.8)),
labels = c("a", "b")
)
)
)
) |>
dplyr::select(
V1,
V2,
dplyr::starts_with("D1"),
V3,
V4,
dplyr::starts_with("D2"),
V5
)
expo_prob <- 1 / (1 + exp(0.4 * X[, 1] + 0.2 * X[, 2] - 0.6 * X[, 3] +
0.4 * X[, 6] + 0.6 * X[, 8] - 0.2 * X[, 9]))
W <- rbinom(n, 1, expo_prob)
event_prob <- 1 / (1 + exp(2 * (pmax(2 * X[, 1], 0) * W - X[, 2] +
X[, 6] + 3 * X[, 9])))
Y <- rbinom(n, 1, event_prob)
cf <- grf::causal_forest(X, Y, W)
cb1 <- CovariateBalance(cf)
cb2 <- CovariateBalance(
cf,
covariates = character(0),
names = c(
"medium imbalance" = "V1",
"low imbalance" = "V2",
"high imbalance" = "V3",
"no imbalance" = "V4",
"discrete 1" = "D1",
"discrete 2" = "D2"
)
)
cb3 <- CovariateBalance(
cf,
covariates = character(0),
names = c(
"medium imbalance" = "V1",
"low imbalance" = "V2",
"high imbalance" = "V3",
"no imbalance" = "V4"
),
treatment_name = "Treatment",
love_breaks = seq(0, 0.5, 0.1),
love_xlim = c(0, 0.5),
cd_nrow = 2,
cd_x_scale_width = 1,
cd_bar_width = 0.3
)