R: Model-based causal feature selection for general response...

dicp {tramicp}

R Documentation

Model-based causal feature selection for general response types

Description

Function 'dicp()' implements invariant causal prediction (ICP) for transformation and generalized linear models, including binary logistic regression, Weibull regression, the Cox model, linear regression and many others. The aim of ICP is to discover the direct causes of a response given data from heterogeneous experimental settings and a potentially large pool of candidate predictors.

Usage

dicp(
  formula,
  data,
  env,
  modFUN,
  verbose = TRUE,
  type = c("residual", "wald", "partial"),
  test = "gcm.test",
  controls = NULL,
  alpha = 0.05,
  baseline_fixed = TRUE,
  greedy = FALSE,
  max_size = NULL,
  mandatory = NULL,
  ...
)

Arguments

`formula`	A `formula` including response and covariate terms.
`data`	A `data.frame` containing response and explanatory variables.
`env`	A `formula` specifying the environment variables (see details).
`modFUN`	Model function from 'tram' (or other packages), e.g., `BoxCox`, `Colr`, `Polr`, `Lm`, `Coxph`, `Survreg`, `Lehmann`. Standard implementations `lm`, `glm`, `survreg`, `coxph`, and `polr` are also supported. See the corresponding alias `<model_name>ICP`, e.g., `PolrICP` or `?implemented_model_classes`. Models from 'lme4', 'tramME', 'glmnet' and 'mgcv' are also supported.
`verbose`	Logical, whether output should be verbose (default `TRUE`).
`type`	Character, type of invariance (`"residual"` or `"wald"`); see `Details`.
`test`	Character, specifies the invariance test to be used when `type = "residual"`. The default is `"gcm.test"`. Other implemented tests are `"HSIC"`, `"t.test"`, `"var.test"`, and `"combined"`. Alternatively, a custom function for testing invariance of the form `\(r, e, controls) {...}` can be supplied, which outputs a list with entry `"p.value"`.
`controls`	Controls for the used tests and the overall procedure, see `dicp_controls`.
`alpha`	Level of invariance test, default `0.05`.
`baseline_fixed`	Fixed baseline transformation, see `dicp_controls`.
`greedy`	Logical, whether to perform a greedy version of ICP (default is `FALSE`).
`max_size`	Numeric; maximum support size.
`mandatory`	A `formula` containing mandatory covariates, i.e., covariates which by domain knowledge are believed to be parents of the response or are in another way required for the environment or model to be valid (for instance, conditionally valid environments or random effects in a mixed model).
`...`	Further arguments passed to `modFUN`.

Details

TRAMICP iterates over all subsets of covariates provided in formula and performs an invariance test based on the conditional covariance between score residuals and environments in env (type = "residual") or the Wald statistic testing for the presence of main and interaction effects of the environments (type = "wald"). The algorithm outputs the intersection over all non-rejected sets as an estimate of the causal parents.

Value

Object of class "dICP", containing

candidate_causal_predictors: Character; intersection of all non-rejected sets,
set_pvals: Numeric vector; set-specific p-values of the invariance test,
predictor_pvals: Numeric vector; predictor-specific p-values,
tests: List of invariance tests.

References

Kook, L., Saengkyongam, S., Lundborg, A. R., Hothorn, T., & Peters, J. (2023). Model-based causal feature selection for general response types. arXiv preprint. doi:10.48550/arXiv.2309.12833

Examples

set.seed(12)
d <- dgp_dicp(n = 1e3, mod = "binary")
dicp(Y ~ X1 + X2 + X3, data = d, env = ~ E, modFUN = "glm",
     family = "binomial", type = "wald")