dicp {tramicp}R Documentation

Model-based causal feature selection for general response types

Description

Function 'dicp()' implements invariant causal prediction (ICP) for transformation and generalized linear models, including binary logistic regression, Weibull regression, the Cox model, linear regression and many others. The aim of ICP is to discover the direct causes of a response given data from heterogeneous experimental settings and a potentially large pool of candidate predictors.

Usage

dicp(
  formula,
  data,
  env,
  modFUN,
  verbose = TRUE,
  type = c("residual", "wald", "partial"),
  test = "gcm.test",
  controls = NULL,
  alpha = 0.05,
  baseline_fixed = TRUE,
  greedy = FALSE,
  max_size = NULL,
  mandatory = NULL,
  ...
)

Arguments

formula

A formula including response and covariate terms.

data

A data.frame containing response and explanatory variables.

env

A formula specifying the environment variables (see details).

modFUN

Model function from 'tram' (or other packages), e.g., BoxCox, Colr, Polr, Lm, Coxph, Survreg, Lehmann. Standard implementations lm, glm, survreg, coxph, and polr are also supported. See the corresponding alias <model_name>ICP, e.g., PolrICP or ?implemented_model_classes. Models from 'lme4', 'tramME', 'glmnet' and 'mgcv' are also supported.

verbose

Logical, whether output should be verbose (default TRUE).

type

Character, type of invariance ("residual" or "wald"); see Details.

test

Character, specifies the invariance test to be used when type = "residual". The default is "gcm.test". Other implemented tests are "HSIC", "t.test", "var.test", and "combined". Alternatively, a custom function for testing invariance of the form \(r, e, controls) {...} can be supplied, which outputs a list with entry "p.value".

controls

Controls for the used tests and the overall procedure, see dicp_controls.

alpha

Level of invariance test, default 0.05.

baseline_fixed

Fixed baseline transformation, see dicp_controls.

greedy

Logical, whether to perform a greedy version of ICP (default is FALSE).

max_size

Numeric; maximum support size.

mandatory

A formula containing mandatory covariates, i.e., covariates which by domain knowledge are believed to be parents of the response or are in another way required for the environment or model to be valid (for instance, conditionally valid environments or random effects in a mixed model).

...

Further arguments passed to modFUN.

Details

TRAMICP iterates over all subsets of covariates provided in formula and performs an invariance test based on the conditional covariance between score residuals and environments in env (type = "residual") or the Wald statistic testing for the presence of main and interaction effects of the environments (type = "wald"). The algorithm outputs the intersection over all non-rejected sets as an estimate of the causal parents.

Value

Object of class "dICP", containing

References

Kook, L., Saengkyongam, S., Lundborg, A. R., Hothorn, T., & Peters, J. (2023). Model-based causal feature selection for general response types. arXiv preprint. doi:10.48550/arXiv.2309.12833

Examples

set.seed(12)
d <- dgp_dicp(n = 1e3, mod = "binary")
dicp(Y ~ X1 + X2 + X3, data = d, env = ~ E, modFUN = "glm",
     family = "binomial", type = "wald")


[Package tramicp version 0.0-2 Index]