cbl {cbl} | R Documentation |
Confounder blanket learner
Description
This function performs the confounder blanket learner (CBL) algorithm for causal discovery.
Usage
cbl(
x,
z,
s = "lasso",
B = 50,
gamma = 0.5,
maxiter = NULL,
params = NULL,
parallel = FALSE,
...
)
Arguments
x |
Matrix or data frame of foreground variables. |
z |
Matrix or data frame of background variables. |
s |
Feature selection method. Includes native support for sparse linear
regression ( |
B |
Number of complementary pairs to draw for stability selection. Following Shah & Samworth (2013), we recommend leaving this fixed at 50. |
gamma |
Omission threshold. If either of two foreground variables is
omitted from the model for the other with frequency |
maxiter |
Maximum number of iterations to loop through if convergence is elusive. |
params |
Optional list to pass to |
parallel |
Compute stability selection subroutine in parallel? Must
register backend beforehand, e.g. via |
... |
Extra parameters to be passed to the feature selection subroutine. |
Details
The CBL algorithm (Watson & Silva, 2022) learns a partial order over
foreground variables x
via relations of minimal conditional
(in)dependence with respect to a set of background variables z
. The
method is sound and complete with respect to a so-called "lazy oracle", who
only answers independence queries about variable pairs conditioned on the
intersection of their respective non-descendants.
For computational tractability, CBL performs conditional independence tests
via supervised learning with feature selection. The current implementation
includes support for sparse linear models (s = "lasso"
) and gradient
boosting machines (s = "boost"
). For statistical inference, CBL uses
complementary pairs stability selection (Shah & Samworth, 2013), which bounds
the probability of errors of commission.
Value
A square, lower triangular ancestrality matrix. Call this matrix m
.
If CBL infers that X_i \prec X_j
, then m[j, i] = 1
. If CBL
infers that X_i \preceq X_j
, then m[j, i] = 0.5
. If CBL infers
that X_i \sim X_j
, then m[j, i] = 0
. Otherwise,
m[j, i] = NA
.
References
Watson, D.S. & Silva, R. (2022). Causal discovery under a confounder blanket. To appear in Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence. arXiv preprint, 2205.05715.
Shah, R. & Samworth, R. (2013). Variable selection with error control: Another look at stability selection. J. R. Statist. Soc. B, 75(1):55–80, 2013.
Examples
# Load data
data(bipartite)
x <- bipartite$x
z <- bipartite$z
# Set seed
set.seed(123)
# Run CBL
cbl(x, z)
# With user-supplied feature selection subroutine
s_new <- function(x, y) {
# Fit model, extract coefficients
df <- data.frame(x, y)
f_full <- lm(y ~ 0 + ., data = df)
f_reduced <- step(f_full, trace = 0)
keep <- names(coef(f_reduced))
# Return bit vector
out <- ifelse(colnames(x) %in% keep, 1, 0)
return(out)
}
cbl(x, z, s = s_new)