CovariateBalance {EpiForsk}R Documentation

Plots for checking covariate balance in causal forest

Description

Generate plots showing balance in the covariates before and after propensity score weighting with a causal forest object.

Usage

CovariateBalance(
  cf,
  plots = c("all", "Love", "density", "ecdf"),
  balance_table = TRUE,
  covariates = NULL,
  names = NULL,
  factor = NULL,
  treatment_name = "W",
  love_breaks = NULL,
  love_xlim = NULL,
  love_scale_color = NULL,
  cd_nrow = NULL,
  cd_ncol = NULL,
  cd_x_scale_width = NULL,
  cd_bar_width = NULL,
  cd_scale_fill = NULL,
  ec_nrow = NULL,
  ec_ncol = NULL,
  ec_x_scale_width = NULL,
  ec_scale_color = NULL
)

Arguments

cf

An object of class causal_forest (and inheriting from class grf).

plots

Character, "all" returns both Love plots and density plots, "Love" returns only Love plots, "density" returns only density plots.

balance_table

Boolean, TRUE to return a table with balance statistics.

covariates

A vector to select covariates to show in balance plots. If cf$X.orig is an unnamed matrix, use a numeric vector to select variables. Otherwise use a character vector. Names provided in the names argument takes priority over existing names in cf$X.orig. If discrete covariates have been one-hot encoded using DiscreteCovariatesToOneHot the name of these discrete covariates can be provided in covariates to select it and to collect all levels into a bar plot to show the distribution.

names

A named character vector. The vector itself should contain covariate names from the causal_forest object, while the names attribute should contain the names to use when plotting. If discrete covariates have been one-hot encoded using DiscreteCovariatesToOneHot, providing just the name of a discrete covariate will modify the name of all levels for plotting. If the vector is unnamed, the provided vector will act as the new covariate names, given in the order of cf$X_orig. If NULL (the default), the original names are used.

factor

A named list with covariates to be converted to factor. Note that one-hot encoded covariates are automatically converted, so need not be specified in the factor argument. Each component of the list must contain the factor levels, using a named vector to supply custom labels.

treatment_name

Character, name of treatment.

love_breaks

Numeric, breaks used in the plot of absolute standardized mean differences.

love_xlim

Numeric, x-limits used in the plot of absolute standardized mean differences.

love_scale_color

Function, scale_color_. function to use in the plot of absolute standardized mean differences.

cd_nrow, cd_ncol

Numeric, the dimensions of the grid to create in covariate distribution plots. If both are NULL it will use the same logic as facet_wrap to set the dimensions.

cd_x_scale_width

Numeric, the distance between major x-axis tics in the covariate distribution plots. If NULL, a width is chosen to display approximately six major tics. If length 1, the same width is used for all covariate plots. If the same length as the number of covariates included, each number is used as the width for different covariates, in the order of the covariates after selection with the tidy-select expression in covariates.

cd_bar_width

Numeric, the width of the bars in the covariate distribution plots (barplots for categorical variables, histograms for continuous variables). If NULL, a width is chosen to display approximately 50 bars in histograms, while 0.9 times the resolution of the data is used in bar plots. If length 1, the same width is used for all covariate plots. This is not recommended if there are both categorical and continuous covariates. If the same length as the number of covariates included, each number is used as the bar width for different covariates, in the order of the covariates after selection with the tidy-select expression in covariates.

cd_scale_fill

Function, scale_fill_. function to use in covariate distribution plots.

ec_nrow, ec_ncol

Numeric, the dimensions of the grid to create in empirical CDF plots. If both are NULL it will use the same logic as facet_wrap to set the dimensions.

ec_x_scale_width

Numeric, the distance between major x-axis tics in the empirical CDF plots. If NULL, a width is chosen to display approximately six major tics. If length 1, the same width is used for all plots. If the same length as the number of covariates included, each number is used as the width for different covariates, in the order of the covariates after selection with the tidy-select expression in covariates.

ec_scale_color

Function, scale_color_. function to use in empirical CDF plots.

Details

If an unnamed character vector is provided in names, it must have length ncol(cf$X.orig). Names of covarates not selected by covariates can be set to NA. If a named character vector is provided in names, all renamed covariates will be kept regardless if they are selected in covariates. Thus to select only renamed covariates, character(0) can be used in covariates. The plot theme can be adjusted using ggplot2 active theme modifiers, see theme_get.

Value

A list with up to five elements:

Author(s)

KIJA

Examples


n <- 1000
p <- 5
X <- matrix(rnorm(n * p), n, p) |>
as.data.frame() |>
dplyr::bind_cols(
  DiscreteCovariatesToOneHot(
    dplyr::tibble(
      D1 = factor(
        sample(1:3, n, replace = TRUE, prob = c(0.2, 0.3, 0.5)),
        labels = c("first", "second", "third")
      ),
      D2 = factor(
        sample(1:2, n, replace = TRUE, prob = c(0.2, 0.8)),
        labels = c("a", "b")
      )
    )
  )
) |>
dplyr::select(
  V1,
  V2,
  dplyr::starts_with("D1"),
  V3,
  V4,
  dplyr::starts_with("D2"),
  V5
)
expo_prob <- 1 / (1 + exp(0.4 * X[, 1] + 0.2 * X[, 2] - 0.6 * X[, 3] +
                          0.4 * X[, 6] + 0.6 * X[, 8] - 0.2 * X[, 9]))
W <- rbinom(n, 1, expo_prob)
event_prob <- 1 / (1 + exp(2 * (pmax(2 * X[, 1], 0) * W - X[, 2] +
                           X[, 6] + 3 * X[, 9])))
Y <- rbinom(n, 1, event_prob)
cf <- grf::causal_forest(X, Y, W)
cb1 <- CovariateBalance(cf)
cb2 <- CovariateBalance(
  cf,
  covariates = character(0),
  names = c(
  "medium imbalance" = "V1",
  "low imbalance" = "V2",
  "high imbalance" = "V3",
  "no imbalance" = "V4",
  "discrete 1" = "D1",
  "discrete 2" = "D2"
  )
)
cb3 <- CovariateBalance(
  cf,
  covariates = character(0),
  names = c(
    "medium imbalance" = "V1",
    "low imbalance" = "V2",
    "high imbalance" = "V3",
    "no imbalance" = "V4"
  ),
  treatment_name = "Treatment",
  love_breaks = seq(0, 0.5, 0.1),
  love_xlim = c(0, 0.5),
  cd_nrow = 2,
  cd_x_scale_width = 1,
  cd_bar_width = 0.3
)



[Package EpiForsk version 0.1.1 Index]