R: Confounder blanket learner

cbl {cbl}

R Documentation

Confounder blanket learner

Description

This function performs the confounder blanket learner (CBL) algorithm for causal discovery.

Usage

cbl(
  x,
  z,
  s = "lasso",
  B = 50,
  gamma = 0.5,
  maxiter = NULL,
  params = NULL,
  parallel = FALSE,
  ...
)

Arguments

`x`	Matrix or data frame of foreground variables.
`z`	Matrix or data frame of background variables.
`s`	Feature selection method. Includes native support for sparse linear regression (`s = "lasso"`) and gradient boosting (`s = "boost"`). Alternatively, a user-supplied function mapping features `x` and outcome `y` to a bit vector indicating which features are selected. See Examples.
`B`	Number of complementary pairs to draw for stability selection. Following Shah & Samworth (2013), we recommend leaving this fixed at 50.
`gamma`	Omission threshold. If either of two foreground variables is omitted from the model for the other with frequency `gamma` or higher, we infer that they are causally disconnected.
`maxiter`	Maximum number of iterations to loop through if convergence is elusive.
`params`	Optional list to pass to `lgb.train` if `s = "boost"`. See `lightgbm::lgb.train`.
`parallel`	Compute stability selection subroutine in parallel? Must register backend beforehand, e.g. via `doMC`.
`...`	Extra parameters to be passed to the feature selection subroutine.

Details

The CBL algorithm (Watson & Silva, 2022) learns a partial order over foreground variables x via relations of minimal conditional (in)dependence with respect to a set of background variables z. The method is sound and complete with respect to a so-called "lazy oracle", who only answers independence queries about variable pairs conditioned on the intersection of their respective non-descendants.

For computational tractability, CBL performs conditional independence tests via supervised learning with feature selection. The current implementation includes support for sparse linear models (s = "lasso") and gradient boosting machines (s = "boost"). For statistical inference, CBL uses complementary pairs stability selection (Shah & Samworth, 2013), which bounds the probability of errors of commission.

Value

A square, lower triangular ancestrality matrix. Call this matrix m. If CBL infers that X_i \prec X_j, then m[j, i] = 1. If CBL infers that X_i \preceq X_j, then m[j, i] = 0.5. If CBL infers that X_i \sim X_j, then m[j, i] = 0. Otherwise, m[j, i] = NA.

References

Watson, D.S. & Silva, R. (2022). Causal discovery under a confounder blanket. To appear in Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence. arXiv preprint, 2205.05715.

Shah, R. & Samworth, R. (2013). Variable selection with error control: Another look at stability selection. J. R. Statist. Soc. B, 75(1):55–80, 2013.

Examples

# Load data
data(bipartite)
x <- bipartite$x
z <- bipartite$z

# Set seed
set.seed(123)

# Run CBL
cbl(x, z)

# With user-supplied feature selection subroutine
s_new <- function(x, y) {
  # Fit model, extract coefficients
  df <- data.frame(x, y)
  f_full <- lm(y ~ 0 + ., data = df)
  f_reduced <- step(f_full, trace = 0)
  keep <- names(coef(f_reduced))
  # Return bit vector
  out <- ifelse(colnames(x) %in% keep, 1, 0)
  return(out)
}

cbl(x, z, s = s_new)

[Package cbl version 0.1.3 Index]