CERFIT {CERFIT}R Documentation

Fits a Random Forest of Interactions Trees

Description

Estimates an observations individualized treatment effect for RCT and observational data. Treatment can be an binary, categorical, ordered, or continuous variable. Currently if response is binary useRes must be set equal to TRUE.

Usage

CERFIT(
  formula,
  data,
  ntrees,
  subset = NULL,
  search = c("exhaustive", "sss"),
  method = c("RCT", "observational"),
  PropForm = c("randomForest", "CBPS", "GBM", "HI"),
  split = c("t.test"),
  mtry = NULL,
  nsplit = NULL,
  nsplit.random = FALSE,
  minsplit = 20,
  minbucket = round(minsplit/3),
  maxdepth = 30,
  a = 50,
  sampleMethod = c("bootstrap", "subsample", "subsampleByID", "allData"),
  useRes = TRUE,
  scale.y = FALSE
)

Arguments

formula

Formula to build CERFIT. Categorical predictors must be listed as a factor. e.g., Y ~ x1 + x2 | treatment

data

Data to grow a tree.

ntrees

Number of Trees to grow

subset

A logical vector that controls what observations are used to grow the forest. The default value will use the entire dataframe

search

Method to search through candidate splits

method

For observational study data, method="observational";for randomized study data, method="RCT".

PropForm

Method to estimate propensity score

split

Impurity measure splitting statistic

mtry

Number of variables to consider at each split

nsplit

Number of cut points selected

nsplit.random

Logical: indicates if process to select cut points are random

minsplit

Number of observations required to continue growing tree

minbucket

Number of observations required in each child node

maxdepth

Maximum depth of tree

a

Sigmoid approximation variable (for "sss" which is still under development)

sampleMethod

Method to sample learning sample. Default is bootstrap. Subsample takes a subsample of the original data. SubsamplebyID samples by an ID column and uses all observations that have that ID. allData uses the entire data set for every tree.

useRes

Logical indicator if you want to fit the CERFIT model to the residuals from a linear model

scale.y

Logical, standardize y when creating splits (For "sss" to increase stability)

Details

This function implements Random Forest of Interaction Trees proposed in Su (2018). Which is a modification of the Random Forest algorithm where instead of a split being chosen to maximize prediction accuracy each split is chosen to maximized subgroup treatment heterogeneity. It chooses the best split by maximizing the test statistic for H_0: \beta_3=0 in the following linear model

Y_i = \beta_0 + \beta_1I(X_{ij} < c) + \beta_2I(Z = 1) + \beta_3I(X_{ij} < c)I(Z = 1) + \varepsilon_i

Where X_{ij} represents the splitting variable and Z = 1 represents treatment. So, by maximizing the test statistic for \beta_3 we are maximizing the treatment difference between the nodes.

The above equation only works when the data comes from a randomized controlled trial. But we can modify it to gives us unbiased estimates of treatment effect in observational studies Li et al. (2022). To do that we add propensity score into the linear model.

Y_i = \beta_0 + \beta_1I(X_{ij} < c) + \beta_2I(Z = 1) + \beta_3I(X_{ij} < c)I(Z = 1) + \beta_4e_i + \varepsilon_i

Where e_i represents the propensity score. The CERIT function will estimate propensity score automatically when the method argument is set to observational.

To control how this function estimates propensity score you can use the PropForm argument. Which can take four possible values randomForest, CBPS, GBM and HI. randomForest uses the randomForest package to use a random forest to estimate propensity score, CBPS uses Covariate balancing propensity score to estimate propensity score GBM uses generalized boosted regression models to estimate propensity score, and HI is for continuous treatment and estimates the general propensity score. Some of these options only work for certain treatment types. Full list below

Value

Returns a fitted CERFIT object which is a list with the following elements

References

Examples

fit <- CERFIT(Result_of_Treatment ~ sex + age + Number_of_Warts + Area + Time + Type | treatment,
data = warts,
ntrees = 30,
method = "RCT",
mtry = 2)


[Package CERFIT version 0.1.0 Index]