R: Unconditional Bernoulli Tree-Based Scan Statistics for R

TreeMineR {TreeMineR}

R Documentation

Unconditional Bernoulli Tree-Based Scan Statistics for R

Description

Unconditional Bernoulli Tree-Based Scan Statistics for R

Usage

TreeMineR(
  data,
  tree,
  p = NULL,
  n_exposed = NULL,
  n_unexposed = NULL,
  dictionary = NULL,
  delimiter = "/",
  n_monte_carlo_sim = 9999,
  random_seed = FALSE,
  future_control = list(strategy = "sequential")
)

Arguments

`data`	The dataset used for the computation. The dataset needs to include the following columns: `id` An integer that is unique to every individual. `leaf` A string identifying the unique diagnoses or leafs for each individual. `exposed` A 0/1 indicator of the individual's exposure status. See below for the first and last rows included in the example dataset. id leaf exposed 1 K251 0 2 Q702 0 3 G96 0 3 S949 0 4 S951 0 --- 999 V539 1 999 V625 1 999 G823 1 1000 L42 1 1000 T524 1
`tree`	A dataset with one variable `pathString` defining the tree structure that you would like to use. This dataset can, e.g., be created using `create_tree`.
`p`	The proportion of exposed individuals in the dataset. Will be calculated based on `n_exposed`, and `n_unexposed` if both are supplied.
`n_exposed`	Number of exposed individuals (Optional).
`n_unexposed`	Number of unexposed individuals (Optional).
`dictionary`	A `data.frame` that includes one `node` column and a `title` column, which are used for labeling the cuts in the output of `TreeMineR`.
`delimiter`	A character defining the delimiter of different tree levels within your `pathString`. The default is `/`.
`n_monte_carlo_sim`	The number of Monte-Carlo simulations to be used for calculating P-values.
`random_seed`	Random seed used for the Monte-Carlo simulations.
`future_control`	A list of arguments passed `future::plan`. This is useful if one would like to parallelise the Monte-Carlo simulations to decrease the computation time. The default is a sequential run of the Monte-Carlo simulations.

Value

A data.frame with the following columns:

cut: The name of the cut G.
n1: The number of exposed events belonging to cut G.
n1: The number of inexposed events belonging to cut G.
risk1: The absolute risk of getting an event belonging to cut G among the exposed.
risk0: The absolute risk of getting an event belonging to cut G among the unexposed.
RR: The risk ratio of the absolute risk among the exposed over the absolute risk among the unexposed
llr: The log-likelihood ratio comparing the observed and expected number of exposed events belonging to cut G.
p: The P-value that cut G is a cluster of events.

References

Kulldorff et al. (2003) A tree-based scan statistic for database disease surveillance. Biometrics 56(2): 323-331. DOI: 10.1111/1541-0420.00039.

Examples

TreeMineR(data = diagnoses,
          tree  = icd_10_se,
          p = 1/11,
          n_monte_carlo_sim = 99,
          random_seed = 1234) |>
  head()

[Package TreeMineR version 1.0.1 Index]