doubleML {gamlr}R Documentation

double ML

Description

double (i.e., double) Machine Learning for treatment effect estimation

Usage

doubleML(x, d, y, nfold=2, foldid=NULL, family="gaussian", cl=NULL, ...)

Arguments

x

Covariates; see gamlr.

d

The matrix of treatment variables. Each column is used as a response by gamlr during the residualization procedure.

y

Response; see gamlr.

nfold

The number of cross validation folds.

foldid

An optional length-n vector of fold memberships for each observation. If specified, this dictates nfold.

family

Response model type for the treatment prediction; either "gaussian", "poisson", or "binomial". This can be either be a single family shared by all columns of d or a vector of families of length ncol(d)

cl

possible parallel library cluster. If this is not-NULL, the CV folds are executed in parallel. This copies the data nfold times, so make sure you have the memory space.

...

Arguments to all the gamlr regressions.

Details

Performs the double ML procedure of Chernozhukov et al. (2017) to produce an unbiased estimate of the average linear treatment effects of d on y. This procedure uses gamlr to regress y and each column of d onto x. In the cross-fitting routine described in Taddy (2019), these regressions are trained on a portion of the data and the out-of-sample residuals are calculated on the left-out fold. Model selection for these residualization steps is based on the AICc selection rule. The response residuals are then regressed onto the treatment residuals using lm and the resulting estimates and standard errors are unbiased for the treatment effects under the assumptions of Chernozhukov et al.

Value

A fitted lm object estimating the treatment effect of d on y. The lm function has been called with x=TRUE, y=TRUE such that this object contains the residualized d as x and residualized y as y.

Author(s)

Matt Taddy mataddy@gmail.com

References

Chernozhukov, Victor and Chetverikov, Denis and Demirer, Mert and Duflo, Esther and Hansen, Christian and Newey, Whitney and Robins, James (The Econometrics Journal, 2017), Double/debiased machine learning for treatment and structural parameters

Matt Taddy, 2019. Business Data Science, McGraw-Hill

See Also

gamlr, hockey, AICc

Examples


data(hockey)
who <- which(colnames(player)=="SIDNEY_CROSBY")
s <- sample.int(nrow(player),10000) # subsample for a fast example
doubleML(x=player[s,-who], d=player[s,who], y=goal$homegoal[s], standardize=FALSE)


[Package gamlr version 1.13-8 Index]