init.causalForest {htetree} | R Documentation |
Causal Effect Regression and Estimation Forests (Tree Ensembles)
Description
Build a random causal forest by fitting a user selected number of
causalTree
models to get an ensemble of rpart
objects.
Usage
init.causalForest(
formula,
data,
treatment,
weights = FALSE,
cost = FALSE,
num.trees,
ncov_sample
)
## S3 method for class 'causalForest'
predict(object, newdata, predict.all = FALSE, type = "vector", ...)
causalForest(
formula,
data,
treatment,
na.action = na.causalTree,
split.Rule = "CT",
double.Sample = TRUE,
split.Honest = TRUE,
split.Bucket = FALSE,
bucketNum = 5,
bucketMax = 100,
cv.option = "CT",
cv.Honest = TRUE,
minsize = 2L,
propensity,
control,
split.alpha = 0.5,
cv.alpha = 0.5,
sample.size.total = floor(nrow(data)/10),
sample.size.train.frac = 0.5,
mtry = ceiling(ncol(data)/3),
nodesize = 1,
num.trees = nrow(data),
cost = FALSE,
weights = FALSE,
ncolx,
ncov_sample
)
Arguments
formula |
a formula, with a response and features but no
interaction terms. If this a a data frome, that is taken as the model frame
(see |
data |
an optional data frame that includes the variables named in the formula. |
treatment |
a vector that indicates the treatment status of each observation. 1 represents treated and 0 represents control. Only binary treatment supported in this version. |
weights |
optional case weights. |
cost |
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose. |
num.trees |
Number of trees to be built in the causal forest |
ncov_sample |
Number of covariates randomly sampled to build each tree in the forest |
object |
a |
newdata |
new data to predict |
predict.all |
If TRUE, return predicted individual effect for each observations. Otherwise, return the average effect. |
type |
the type of returned object |
... |
arguments to |
na.action |
the default action deletes all observations for which
|
split.Rule |
causalTree splitting options, one of |
double.Sample |
boolean option, |
split.Honest |
boolean option, |
split.Bucket |
boolean option, |
bucketNum |
number of observations in each bucket when set
|
bucketMax |
Option to choose maximum number of buckets to use in
splitting when set |
cv.option |
cross validation options, one of |
cv.Honest |
boolean option, |
minsize |
in order to split, each leaf must have at least
|
propensity |
propensity score used in |
control |
a list of options that control details of the
|
split.alpha |
scale parameter between 0 and 1, used in splitting
risk evaluation function for |
cv.alpha |
scale paramter between 0 and 1, used in cross validation
risk evaluation function for |
sample.size.total |
Sample size used to build each tree in the forest (sampled randomly with replacement). |
sample.size.train.frac |
Fraction of the sample size used for building each tree (training). For eexample, if the sample.size.total is 1000 and frac =0.5 then, 500 samples will be used to build the tree and the other 500 samples will be used the evaluate the tree. |
mtry |
Number of data features used to build a tree (This variable is not used presently). |
nodesize |
Minimum number of observations for treated and control cases in one leaf node |
ncolx |
Total number of covariates |
Details
CausalForest builds an ensemble of CausalTrees (See Athey and Imbens,
Recursive Partitioning for Heterogeneous Causal
Effects (2016)), by repeated random sampling of the data with replacement.
Further, each tree is built using a randomly sampled subset of all available
covariates. A causal forest object is a list of trees. To predict, call R's
predict function with new test data and the causalForest object (estimated
on the training data) obtained after calling the causalForest function.
During the prediction phase, the average value over all tree predictions
is returned as the final prediction by default.
To return the predictions of each tree in the forest for each test
observation, set the flag predict.all=TRUE
CausalTree differs from rpart
function from rpart package in
splitting rules and cross validation methods. Please check Athey
and Imbens, Recursive Partitioning for Heterogeneous Causal
Effects (2016) and Stefan Wager and Susan Athey, Estimation and
Inference of Heterogeneous Treatment Effects using Random Forests
for more details.
Value
An object of class rpart
. See rpart.object
.
References
Breiman L., Friedman J. H., Olshen R. A., and Stone, C. J. (1984) Classification and Regression Trees. Wadsworth.
Athey, S and G Imbens (2016) Recursive Partitioning for Heterogeneous Causal Effects. http://arxiv.org/abs/1504.01132
Wager,S and Athey, S (2015) Estimation and Inference of Heterogeneous Treatment Effects using Random Forests http://arxiv.org/abs/1510.04342
See Also
causalTree
honest.causalTree
,
rpart.control
, rpart.object
,
summary.rpart
, rpart.plot
Examples
library(rpart)
library("htetree")
cf <- causalForest(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10, data=simulation.1,
treatment=simulation.1$treatment,
split.Rule="CT", split.Honest=TRUE,
split.Bucket=FALSE, bucketNum = 5,
bucketMax = 100, cv.option="CT", cv.Honest=TRUE, minsize = 2L,
split.alpha = 0.5, cv.alpha = 0.5,
sample.size.total = floor(nrow(simulation.1) / 2),
sample.size.train.frac = .5,
mtry = ceiling(ncol(simulation.1)/3), nodesize = 3, num.trees= 5,
ncolx=10,ncov_sample=3)
cfpredtest <- predict.causalForest(cf, newdata=simulation.1[1:100,],
type="vector")