TreeMineR {TreeMineR}R Documentation

Unconditional Bernoulli Tree-Based Scan Statistics for R

Description

Unconditional Bernoulli Tree-Based Scan Statistics for R

Usage

TreeMineR(
  data,
  tree,
  p = NULL,
  n_exposed = NULL,
  n_unexposed = NULL,
  dictionary = NULL,
  delimiter = "/",
  n_monte_carlo_sim = 9999,
  random_seed = FALSE,
  future_control = list(strategy = "sequential")
)

Arguments

data

The dataset used for the computation. The dataset needs to include the following columns:

id

An integer that is unique to every individual.

leaf

A string identifying the unique diagnoses or leafs for each individual.

exposed

A 0/1 indicator of the individual's exposure status.

See below for the first and last rows included in the example dataset.

   id leaf exposed
    1 K251       0
    2 Q702       0
    3  G96       0
    3 S949       0
    4 S951       0
 ---
  999 V539       1
  999 V625       1
  999 G823       1
 1000  L42       1
 1000 T524       1
tree

A dataset with one variable pathString defining the tree structure that you would like to use. This dataset can, e.g., be created using create_tree.

p

The proportion of exposed individuals in the dataset. Will be calculated based on n_exposed, and n_unexposed if both are supplied.

n_exposed

Number of exposed individuals (Optional).

n_unexposed

Number of unexposed individuals (Optional).

dictionary

A data.frame that includes one node column and a title column, which are used for labeling the cuts in the output of TreeMineR.

delimiter

A character defining the delimiter of different tree levels within your pathString. The default is /.

n_monte_carlo_sim

The number of Monte-Carlo simulations to be used for calculating P-values.

random_seed

Random seed used for the Monte-Carlo simulations.

future_control

A list of arguments passed future::plan. This is useful if one would like to parallelise the Monte-Carlo simulations to decrease the computation time. The default is a sequential run of the Monte-Carlo simulations.

Value

A data.frame with the following columns:

cut

The name of the cut G.

n1

The number of exposed events belonging to cut G.

n1

The number of inexposed events belonging to cut G.

risk1

The absolute risk of getting an event belonging to cut G among the exposed.

risk0

The absolute risk of getting an event belonging to cut G among the unexposed.

RR

The risk ratio of the absolute risk among the exposed over the absolute risk among the unexposed

llr

The log-likelihood ratio comparing the observed and expected number of exposed events belonging to cut G.

p

The P-value that cut G is a cluster of events.

References

Kulldorff et al. (2003) A tree-based scan statistic for database disease surveillance. Biometrics 56(2): 323-331. DOI: 10.1111/1541-0420.00039.

Examples

TreeMineR(data = diagnoses,
          tree  = icd_10_se,
          p = 1/11,
          n_monte_carlo_sim = 99,
          random_seed = 1234) |>
  head()


[Package TreeMineR version 1.0.1 Index]