R: Hierarchically Regularized Entropy Balancing

hbal {hbal}

R Documentation

Hierarchically Regularized Entropy Balancing

Description

hbal performs hierarchically regularized entropy balancing such that the covariate distributions of the control group match those of the treatment group. hbal automatically expands the covariate space to include higher order terms and uses cross-validation to select variable penalties for the balancing conditions.

hbal performs hierarchically regularized entropy balancing such that the covariate distributions of the control group match those of the treatment group. hbal automatically expands the covariate space to include higher order terms and uses cross-validation to select variable penalties for the balancing conditions.

Usage

hbal(data, Treat, X, Y = NULL, w = NULL, 
     X.expand = NULL, X.keep = NULL, expand.degree = 1,
     coefs = NULL, max.iterations = 200, cv = NULL, folds = 4,
     ds = FALSE, group.exact = NULL, group.alpha = NULL,
     term.alpha = NULL, constraint.tolerance = 1e-3, print.level = 0,
     grouping = NULL, group.labs = NULL, linear.exact = TRUE, shuffle.treat = TRUE,
     exclude = NULL,force = FALSE, seed = 94035)

Arguments

`data`	a dataframe that contains the treatment, outcome, and covariates.
`Treat`	a character string of the treatment variable.
`X`	a character vector of covariate names to balance on.
`Y`	a character string of the outcome variable.
`w`	a character string of the weighting variable for base weights
`X.expand`	a character vector of covariate names for serial expansion.
`X.keep`	a character vector of covariate names to keep regardless of whether they are selected in double selection.
`expand.degree`	degree of series expansion. 1 means no expansion. Default is 1.
`coefs`	initial coefficients for the reweighting algorithm (lambdas).
`max.iterations`	maximum number of iterations. Default is 200.
`cv`	whether to use cross validation. Default is `TRUE`.
`folds`	number of folds for cross validation. Only used when cv is `TRUE`.
`ds`	whether to perform double selection prior to balancing. Default is `FALSE`.
`group.exact`	binary indicator of whether each covariate group should be exact balanced.
`group.alpha`	penalty for each covariate group
`term.alpha`	named vector of ridge penalties, only takes 0 or 1.
`constraint.tolerance`	tolerance level for overall imbalance. Default is 1e-3.
`print.level`	details of printed output.
`grouping`	different groupings of the covariates. Must be specified if expand is `FALSE`.
`group.labs`	labels for user-supplied groups
`linear.exact`	seek exact balance on the level terms
`shuffle.treat`	whether to use cross-validation on the treated units. Default is `TRUE`.
`exclude`	list of covariate name pairs or triplets to be excluded.
`force`	binary indicator of whether to expand covariates when there are too many
`seed`	random seed to be set. Set random seed when cv=`TRUE` for reproducibility.

Details

In the simplest set-up, user can just pass in {Treatment, X, Y}. The default settings will serially expand X to include higher order terms, hierarchically residualize these terms, perform double selection to only keep the relevant variables and use cross-validation to select penalities for different groupings of the covariates.

Value

An list object of class hbal with the following elements:

`coefs`	vector that contains coefficients from the reweighting algorithm.
`mat`	matrix of serially expanded covariates if expand=`TRUE`. Otherwise, the original covariate matrix is returned.
`penalty`	vector of ridge penalties used for each covariate
`weights`	vector that contains the control group weights assigned by hbal.
`W`	vector of treatment status
`Y`	vector of outcome

Author(s)

Yiqing Xu, Eddie Yang

Yiqing Xu <yiqingxu@stanford.edu>, Eddie Yang <z5yang@ucsd.edu>

References

Xu, Y., & Yang, E. (2022). Hierarchically Regularized Entropy Balancing. Political Analysis, 1-8. doi:10.1017/pan.2022.12

Examples

# Example 1
set.seed(1984)
N <- 500
X1 <- rnorm(N)
X2 <- rbinom(N,size=1,prob=.5)
X <- cbind(X1, X2)
treat <- rbinom(N, 1, prob=0.5) # Treatment indicator
y <- 0.5 * treat + X[,1] + X[,2] + rnorm(N) # Outcome
dat <- data.frame(treat=treat, X, Y=y)
out <- hbal(Treat = 'treat', X = c('X1', 'X2'), Y = 'Y', data=dat)
summary(hbal::att(out))

# Example 2
## Simulation from Kang and Shafer (2007).
library(MASS)
set.seed(1984)
n <- 500
X <- mvrnorm(n, mu = rep(0, 4), Sigma = diag(4))
prop <- 1 / (1 + exp(X[,1] - 0.5 * X[,2] + 0.25*X[,3] + 0.1 * X[,4]))
# Treatment indicator
treat <- rbinom(n, 1, prop)
# Outcome
y <- 210 + 27.4*X[,1] + 13.7*X[,2] + 13.7*X[,3] + 13.7*X[,4] + rnorm(n)
# Observed covariates
X.mis <- cbind(exp(X[,1]/2), X[,2]*(1+exp(X[,1]))^(-1)+10, 
    (X[,1]*X[,3]/25+.6)^3, (X[,2]+X[,4]+20)^2)
dat <- data.frame(treat=treat, X.mis, Y=y)
out <- hbal(Treat = 'treat', X = c('X1', 'X2', 'X3', 'X4'), Y='Y', data=dat)
summary(att(out))

[Package hbal version 1.2.12 Index]