CERFIT {CERFIT} | R Documentation |
Fits a Random Forest of Interactions Trees
Description
Estimates an observations individualized treatment effect for RCT and observational data. Treatment can be an binary, categorical, ordered, or continuous variable. Currently if response is binary useRes must be set equal to TRUE.
Usage
CERFIT(
formula,
data,
ntrees,
subset = NULL,
search = c("exhaustive", "sss"),
method = c("RCT", "observational"),
PropForm = c("randomForest", "CBPS", "GBM", "HI"),
split = c("t.test"),
mtry = NULL,
nsplit = NULL,
nsplit.random = FALSE,
minsplit = 20,
minbucket = round(minsplit/3),
maxdepth = 30,
a = 50,
sampleMethod = c("bootstrap", "subsample", "subsampleByID", "allData"),
useRes = TRUE,
scale.y = FALSE
)
Arguments
formula |
Formula to build CERFIT. Categorical predictors must be listed as a factor. e.g., Y ~ x1 + x2 | treatment |
data |
Data to grow a tree. |
ntrees |
Number of Trees to grow |
subset |
A logical vector that controls what observations are used to grow the forest. The default value will use the entire dataframe |
search |
Method to search through candidate splits |
method |
For observational study data, method="observational";for randomized study data, method="RCT". |
PropForm |
Method to estimate propensity score |
split |
Impurity measure splitting statistic |
mtry |
Number of variables to consider at each split |
nsplit |
Number of cut points selected |
nsplit.random |
Logical: indicates if process to select cut points are random |
minsplit |
Number of observations required to continue growing tree |
minbucket |
Number of observations required in each child node |
maxdepth |
Maximum depth of tree |
a |
Sigmoid approximation variable (for "sss" which is still under development) |
sampleMethod |
Method to sample learning sample. Default is bootstrap. Subsample takes a subsample of the original data. SubsamplebyID samples by an ID column and uses all observations that have that ID. allData uses the entire data set for every tree. |
useRes |
Logical indicator if you want to fit the CERFIT model to the residuals from a linear model |
scale.y |
Logical, standardize y when creating splits (For "sss" to increase stability) |
Details
This function implements Random Forest of Interaction Trees proposed
in Su (2018). Which is a modification of the Random Forest algorithm where
instead of a split being chosen to maximize prediction accuracy each split
is chosen to maximized subgroup treatment heterogeneity. It chooses the best
split by maximizing the test statistic for H_0: \beta_3=0
in the
following linear model
Y_i = \beta_0 + \beta_1I(X_{ij} < c) + \beta_2I(Z = 1) + \beta_3I(X_{ij} < c)I(Z = 1) + \varepsilon_i
Where X_{ij}
represents the splitting variable and Z = 1 represents
treatment. So, by maximizing the test statistic for \beta_3
we are
maximizing the treatment difference between the nodes.
The above equation only works when the data comes from a randomized controlled trial. But we can modify it to gives us unbiased estimates of treatment effect in observational studies Li et al. (2022). To do that we add propensity score into the linear model.
Y_i = \beta_0 + \beta_1I(X_{ij} < c) + \beta_2I(Z = 1) + \beta_3I(X_{ij} < c)I(Z = 1) + \beta_4e_i + \varepsilon_i
Where e_i
represents the propensity score. The CERIT function will estimate
propensity score automatically when the method argument is set to observational.
To control how this function estimates propensity score you can use the PropForm argument. Which can take four possible values randomForest, CBPS, GBM and HI. randomForest uses the randomForest package to use a random forest to estimate propensity score, CBPS uses Covariate balancing propensity score to estimate propensity score GBM uses generalized boosted regression models to estimate propensity score, and HI is for continuous treatment and estimates the general propensity score. Some of these options only work for certain treatment types. Full list below
binary: GBM, CBPS, randomForest
categorical: GBM, CBPS
ordered: GBM, CBPS
continuous: CBPS, HI
Value
Returns a fitted CERFIT object which is a list with the following elements
RandFor: The Random forest of interaction trees
trt.type: A string containing the treatment type of the data used to fit the model. Cant be binary, multiple, ordered or continuous.
response.type: A string representing the response type of the data. Can be binary or continuous.
useRes: A logical indicator that is TRUE if the model was fit on the residuals of a linear model
data: The data used to fit the model also contains the propensity score if method was set to observational
References
Li, Luo, et al. Causal Effect Random Forest of Interaction Trees for Learning Individualized Treatment Regimes with Multiple Treatments in Observational Studies. Stat, 2022, https://doi.org/10.1002/sta4.457.
Su, X., Peña, A., Liu, L., & Levine, R. (2018). Random forests of interaction trees for estimating individualized treatment effects in randomized trials. Statistics in Medicine, 37(17), 2547- 2560.
G. W. Imbens, The role of the propensity score in estimating dose-response functions., Biometrika, 87 (2000), pp. 706–710.
G. Ridgeway, D. McCarey, and A. Morral, The twang package: Toolkit for weighting and analysis of nonequivalent groups, (2006).
A. Liaw and M. Wiener, Classification and regression by randomforest, R News, 2 (2002), pp. 18–22
Examples
fit <- CERFIT(Result_of_Treatment ~ sex + age + Number_of_Warts + Area + Time + Type | treatment,
data = warts,
ntrees = 30,
method = "RCT",
mtry = 2)