doubleML {gamlr} | R Documentation |
double ML
Description
double (i.e., double) Machine Learning for treatment effect estimation
Usage
doubleML(x, d, y, nfold=2, foldid=NULL, family="gaussian", cl=NULL, ...)
Arguments
x |
Covariates; see |
d |
The matrix of treatment variables. Each column is used as a response by |
y |
Response; see |
nfold |
The number of cross validation folds. |
foldid |
An optional length-n vector of fold memberships for each observation. If specified, this dictates |
family |
Response model type for the treatment prediction;
either "gaussian", "poisson", or "binomial". This can be either be a single family shared by all columns of |
cl |
possible |
... |
Arguments to all the |
Details
Performs the double ML procedure of Chernozhukov et al. (2017) to produce an unbiased estimate of the average linear treatment effects of d
on y
. This procedure uses gamlr
to regress y
and each column of d
onto x
. In the cross-fitting routine described in Taddy (2019), these regressions are trained on a portion of the data and the out-of-sample residuals are calculated on the left-out fold. Model selection for these residualization steps is based on the AICc selection rule. The response residuals are then regressed onto the treatment residuals using lm
and the resulting estimates and standard errors are unbiased for the treatment effects under the assumptions of Chernozhukov et al.
Value
A fitted lm
object estimating the treatment effect of d
on y
. The lm
function has been called with x=TRUE, y=TRUE
such that this object contains the residualized d
as x
and residualized y
as y
.
Author(s)
Matt Taddy mataddy@gmail.com
References
Chernozhukov, Victor and Chetverikov, Denis and Demirer, Mert and Duflo, Esther and Hansen, Christian and Newey, Whitney and Robins, James (The Econometrics Journal, 2017), Double/debiased machine learning for treatment and structural parameters
Matt Taddy, 2019. Business Data Science, McGraw-Hill
See Also
gamlr, hockey, AICc
Examples
data(hockey)
who <- which(colnames(player)=="SIDNEY_CROSBY")
s <- sample.int(nrow(player),10000) # subsample for a fast example
doubleML(x=player[s,-who], d=player[s,who], y=goal$homegoal[s], standardize=FALSE)