cbl {cbl}R Documentation

Confounder blanket learner

Description

This function performs the confounder blanket learner (CBL) algorithm for causal discovery.

Usage

cbl(
  x,
  z,
  s = "lasso",
  B = 50,
  gamma = 0.5,
  maxiter = NULL,
  params = NULL,
  parallel = FALSE,
  ...
)

Arguments

x

Matrix or data frame of foreground variables.

z

Matrix or data frame of background variables.

s

Feature selection method. Includes native support for sparse linear regression (s = "lasso") and gradient boosting (s = "boost"). Alternatively, a user-supplied function mapping features x and outcome y to a bit vector indicating which features are selected. See Examples.

B

Number of complementary pairs to draw for stability selection. Following Shah & Samworth (2013), we recommend leaving this fixed at 50.

gamma

Omission threshold. If either of two foreground variables is omitted from the model for the other with frequency gamma or higher, we infer that they are causally disconnected.

maxiter

Maximum number of iterations to loop through if convergence is elusive.

params

Optional list to pass to lgb.train if s = "boost". See lightgbm::lgb.train.

parallel

Compute stability selection subroutine in parallel? Must register backend beforehand, e.g. via doMC.

...

Extra parameters to be passed to the feature selection subroutine.

Details

The CBL algorithm (Watson & Silva, 2022) learns a partial order over foreground variables x via relations of minimal conditional (in)dependence with respect to a set of background variables z. The method is sound and complete with respect to a so-called "lazy oracle", who only answers independence queries about variable pairs conditioned on the intersection of their respective non-descendants.

For computational tractability, CBL performs conditional independence tests via supervised learning with feature selection. The current implementation includes support for sparse linear models (s = "lasso") and gradient boosting machines (s = "boost"). For statistical inference, CBL uses complementary pairs stability selection (Shah & Samworth, 2013), which bounds the probability of errors of commission.

Value

A square, lower triangular ancestrality matrix. Call this matrix m. If CBL infers that X_i \prec X_j, then m[j, i] = 1. If CBL infers that X_i \preceq X_j, then m[j, i] = 0.5. If CBL infers that X_i \sim X_j, then m[j, i] = 0. Otherwise, m[j, i] = NA.

References

Watson, D.S. & Silva, R. (2022). Causal discovery under a confounder blanket. To appear in Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence. arXiv preprint, 2205.05715.

Shah, R. & Samworth, R. (2013). Variable selection with error control: Another look at stability selection. J. R. Statist. Soc. B, 75(1):55–80, 2013.

Examples

# Load data
data(bipartite)
x <- bipartite$x
z <- bipartite$z

# Set seed
set.seed(123)

# Run CBL
cbl(x, z)

# With user-supplied feature selection subroutine
s_new <- function(x, y) {
  # Fit model, extract coefficients
  df <- data.frame(x, y)
  f_full <- lm(y ~ 0 + ., data = df)
  f_reduced <- step(f_full, trace = 0)
  keep <- names(coef(f_reduced))
  # Return bit vector
  out <- ifelse(colnames(x) %in% keep, 1, 0)
  return(out)
}

cbl(x, z, s = s_new)



[Package cbl version 0.1.3 Index]