honest.causalTree {htetree} | R Documentation |
Causal Effect Regression and Estimation Trees: One-step honest estimation
Description
Fit a causalTree
model to get an honest causal tree,
with tree structure built on training sample (including cross-validation)
and leaf estimates taken from estimation sample.
Return an rpart
object.
Usage
honest.causalTree(
formula,
data,
weights,
treatment,
subset,
est_data,
est_weights,
est_treatment,
est_subset,
na.action = na.causalTree,
split.Rule,
split.Honest,
HonestSampleSize,
split.Bucket,
bucketNum = 10,
bucketMax = 40,
cv.option,
cv.Honest,
minsize = 2L,
model = FALSE,
x = FALSE,
y = TRUE,
propensity,
control,
split.alpha = 0.5,
cv.alpha = 0.5,
cv.gamma = 0.5,
split.gamma = 0.5,
cost,
...
)
Arguments
formula |
a formula, with a response and features but
no interaction terms. If this a a data frome, that is taken as
the model frame (see |
data |
an optional data frame that includes the variables named in the formula. |
weights |
optional case weights. |
treatment |
a vector that indicates the treatment status of each observation. 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
subset |
optional expression saying that only a subset of the rows of the data should be used in the fit. |
est_data |
data frame to be used for leaf estimates; the estimation sample. Must contain the variables used in training the tree. |
est_weights |
optional case weights for estimation sample |
est_treatment |
treatment vector for estimation sample. Must be same length as estimation data. A vector indicates the treatment status of the data, 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
est_subset |
optional expression saying that only a subset of the rows of the estimation data should be used in the fit of the re-estimated tree. |
na.action |
the default action deletes all observations for which
|
split.Rule |
causalTree splitting options, one of |
split.Honest |
boolean option, |
HonestSampleSize |
number of observations anticipated to be used in honest re-estimation after building the tree. This enters the risk function used in both splitting and cross-validation. |
split.Bucket |
boolean option, |
bucketNum |
number of observations in each bucket when set
|
bucketMax |
Option to choose maximum number of buckets to use in
splitting when set |
cv.option |
cross validation options, one of |
cv.Honest |
boolean option, |
minsize |
in order to split, each leaf must have at least
|
model |
model frame of |
x |
keep a copy of the |
y |
keep a copy of the dependent variable in the result. If
missing and |
propensity |
propensity score used in |
control |
a list of options that control details of the
|
split.alpha |
scale parameter between 0 and 1, used in splitting
risk evaluation function for |
cv.alpha |
scale paramter between 0 and 1, used in cross validation
risk evaluation function for |
cv.gamma , split.gamma |
optional parameters used in evaluating policies. |
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
... |
arguments to |
Value
An object of class rpart
. See rpart.object
.
References
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
Athey, S and G Imbens (2016) Recursive Partitioning for Heterogeneous Causal Effects. http://arxiv.org/abs/1504.01132
See Also
causalTree
,
estimate.causalTree
, rpart.object
,
summary.rpart
, rpart.plot
Examples
library("rpart")
library("rpart.plot")
library("htetree")
n <- nrow(simulation.1)
trIdx <- which(simulation.1$treatment == 1)
conIdx <- which(simulation.1$treatment == 0)
train_idx <- c(sample(trIdx, length(trIdx) / 2), sample(conIdx,
length(conIdx) / 2))
train_data <- simulation.1[train_idx, ]
est_data <- simulation.1[-train_idx, ]
honestTree <- honest.causalTree(y ~ x1 + x2 + x3 + x4, data = train_data,
treatment = train_data$treatment,
est_data = est_data,
est_treatment = est_data$treatment,
split.Rule = "CT", split.Honest = TRUE,
HonestSampleSize = nrow(est_data),
split.Bucket = TRUE, cv.option = "CT")
opcp <- honestTree$cptable[,1][which.min(honestTree$cptable[,4])]
opTree <- prune(honestTree, opcp)
rpart.plot(opTree)