causal_ols_rpart {aggTrees} | R Documentation |

## Estimation and Inference about the GATEs with rpart Objects

### Description

Obtains point estimates and standard errors for the group average treatment effects (GATEs), where groups correspond to the
leaves of an `rpart`

object. Additionally, performs some hypothesis testing.

### Usage

```
causal_ols_rpart(
tree,
y,
D,
X,
method = "aipw",
scores = NULL,
boot_ci = FALSE,
boot_R = 2000
)
```

### Arguments

`tree` |
An |

`y` |
Outcome vector. |

`D` |
Treatment assignment vector |

`X` |
Covariate matrix (no intercept). |

`method` |
Either |

`scores` |
Optional, vector of scores to be used in the regression. Useful to save computational time if scores have already been estimated. Ignored if |

`boot_ci` |
Logical, whether to compute bootstrap confidence intervals. |

`boot_R` |
Number of bootstrap replications. Ignored if |

### Details

#### Point estimates and standard errors for the GATEs

The GATEs and their standard errors are obtained by fitting an appropriate linear model. If `method == "raw"`

, we
estimate via OLS the following:

`Y_i = \sum_{l = 1}^{|T|} L_{i, l} \gamma_l + \sum_{l = 1}^{|T|} L_{i, l} D_i \beta_l + \epsilon_i`

with `L_{i, l}`

a dummy variable equal to one if the i-th unit falls in the l-th leaf of `tree`

, and |T| the number of
groups. If the treatment is randomly assigned, one can show that the betas identify the GATE in each leaf. However, this is not true
in observational studies due to selection into treatment. In this case, the user is expected to use `method == "aipw"`

to run
the following regression:

`score_i = \sum_{l = 1}^{|T|} L_{i, l} \beta_l + \epsilon_i`

where score_i are doubly-robust scores constructed via honest regression forests and 5-fold cross fitting (unless the user specifies
the argument `scores`

). This way, betas again identify the GATEs.

Regardless of `method`

, standard errors are estimated via the Eicker-Huber-White estimator.

If `boot_ci == TRUE`

, the routine also computes asymmetric bias-corrected and accelerated 95% confidence intervals using 2000 bootstrap
samples.

If `tree`

consists of a root only, `causal_ols_rpart`

regresses `y`

on a constant and `D`

if
`method == "raw"`

, or regresses the doubly-robust scores on a constant if `method == "aipw"`

. This way,
we get an estimate of the overall average treatment effect.

#### Hypothesis testing

`causal_ols_rpart`

uses the standard errors obtained by fitting the linear models above to test the hypotheses
that the GATEs are different across all pairs of leaves. Here, we adjust p-values to account for multiple hypotheses testing
using Holm's procedure.

#### Caution on Inference

"honesty" is a necessary requirement to get valid inference. Thus, observations in `y`

, `D`

, and
`X`

must not have been used to construct the `tree`

and the `scores`

.

### Value

A list storing:

`model` |
The model fitted to get point estimates and standard errors for the GATEs, as an |

`gates_diff_pairs` |
Results of testing whether GATEs differ across all pairs of leaves. This is a list storing GATEs differences and p-values adjusted using Holm's procedure (check |

`boot_ci` |
Bootstrap confidence intervals (this is an empty list if |

`scores` |
Vector of doubly robust scores. |

### Author(s)

Riccardo Di Francesco

### References

R Di Francesco (2022). Aggregation Trees. CEIS Research Paper, 546. doi:10.2139/ssrn.4304256.

### See Also

`estimate_rpart`

`avg_characteristics_rpart`

### Examples

```
## Generate data.
set.seed(1986)
n <- 1000
k <- 3
X <- matrix(rnorm(n * k), ncol = k)
colnames(X) <- paste0("x", seq_len(k))
D <- rbinom(n, size = 1, prob = 0.5)
mu0 <- 0.5 * X[, 1]
mu1 <- 0.5 * X[, 1] + X[, 2]
y <- mu0 + D * (mu1 - mu0) + rnorm(n)
## Split the sample.
splits <- sample_split(length(y), training_frac = 0.5)
training_idx <- splits$training_idx
honest_idx <- splits$honest_idx
y_tr <- y[training_idx]
D_tr <- D[training_idx]
X_tr <- X[training_idx, ]
y_hon <- y[honest_idx]
D_hon <- D[honest_idx]
X_hon <- X[honest_idx, ]
## Construct a tree using training sample.
library(rpart)
tree <- rpart(y ~ ., data = data.frame("y" = y_tr, X_tr), maxdepth = 2)
## Estimate GATEs in each node (internal and terminal) using honest sample.
results <- causal_ols_rpart(tree, y_hon, D_hon, X_hon, method = "raw")
summary(results$model) # Coefficient of leafk:D is GATE in k-th leaf.
results$gates_diff_pair$gates_diff # GATEs differences.
results$gates_diff_pair$holm_pvalues # leaves 1-2 and 3-4 not statistically different.
```

*aggTrees*version 2.0.2 Index]