R: Honest inference in RD

RDHonest {RDHonest}

R Documentation

Honest inference in RD

Description

Calculate estimators and bias-aware CIs for the sharp or fuzzy RD parameter, or for value of the conditional mean at a point.

Usage

RDHonest(
  formula,
  data,
  subset,
  weights,
  cutoff = 0,
  M,
  kern = "triangular",
  na.action,
  opt.criterion = "MSE",
  h,
  se.method = "nn",
  alpha = 0.05,
  beta = 0.8,
  J = 3,
  sclass = "H",
  T0 = 0,
  point.inference = FALSE,
  sigmaY2,
  sigmaD2,
  sigmaYD,
  clusterid
)

Arguments

`formula`	an object of class `"formula"` (or one that can be coerced to that class). The formula syntax is `outcome ~ running_variable` for inference at a point. For sharp RD, it is `outcome ~ running_variable` if there are no covariates, or `outcome ~ running_variable \| covariates` if covariates are present. For fuzzy RD, it is `outcome \| treatment ~ running_variable \| covariates`, with `covariates` optional.
`data`	optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the outcome and running variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which the function is called.
`subset`	optional vector specifying a subset of observations to be used in the fitting process.
`weights`	Optional vector of weights to weight the observations (useful for aggregated data). The weights are interpreted as the number of observations that each aggregated data point averages over. Disregarded if optimal kernel is used.
`cutoff`	specifies the RD cutoff in the running variable. For inference at a point, specifies the point `x_0` at which to calculate the conditional mean.
`M`	Bound on second derivative of the conditional mean function, a numeric vector of length one. For fuzzy RD, `M` needs to be a numeric vector of length two, specifying the smoothness of the conditional mean for the outcome and treatment, respectively.
`kern`	specifies the kernel function used in the local regression. It can either be a string equal to `"triangular"` (`k(u)=(1-\|u\|)_{+}`), `"epanechnikov"` (`k(u)=(3/4)(1-u^2)_{+}`), or `"uniform"` (`k(u)= (\|u\|<1)/2`), or else a kernel function. If equal to `"optimal"`, use the finite-sample optimal linear estimator under Taylor smoothness class, instead of a local linear estimator.
`na.action`	function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options` (usually `na.omit`). Another possible value is `na.fail`
`opt.criterion`	Optimality criterion that the bandwidth is designed to optimize. The options are: `"MSE"` Finite-sample maximum MSE `"FLCI"` Length of (fixed-length) two-sided confidence intervals. `"OCI"` Given quantile of excess length of one-sided confidence intervals The methods use conditional variance given by `sigmaY2`, if supplied. For fuzzy RD, `sigmaD2` and `sigmaYD` also need to be supplied in this case. Otherwise, the methods use preliminary variance estimates based on assuming homoskedasticity on either side of the cutoff.
`h`	bandwidth, a scalar parameter. If not supplied, optimal bandwidth is computed according to criterion given by `opt.criterion`.
`se.method`	method for estimating standard error of the estimate, one of: "nn" Nearest neighbor method "EHW" Eicker-Huber-White, with residuals from local regression (local polynomial estimators only). "supplied.var" Use conditional variance supplied by `sigmaY2` instead of computing residuals. For fuzzy RD, `sigmaD2` and `sigmaYD` also need to be supplied in this case.
`alpha`	determines confidence level, `1-alpha` for constructing/optimizing confidence intervals.
`beta`	Determines quantile of excess length to optimize, if bandwidth optimizes given quantile of excess length of one-sided confidence intervals (`opt.criterion="OCI"`); otherwise ignored.
`J`	Number of nearest neighbors, if `se.method="nn"` is specified. Otherwise ignored.
`sclass`	Smoothness class, either `"T"` for Taylor or `"H"` for Hölder class.
`T0`	Initial estimate of the treatment effect for calculating the optimal bandwidth. Only relevant for fuzzy RD.
`point.inference`	Do inference at a point determined by `cutoff` instead of RD.
`sigmaY2`	Supply variance of outcome. Ignored when kernel is optimal.
`sigmaD2`	Supply variance of treatment (fuzzy RD only).
`sigmaYD`	Supply covariance of treatment and outcome (fuzzy RD only).
`clusterid`	Vector specifying cluster membership. If supplied, `se.method="EHW"` is required, and standard errors use cluster-robust variance formulas.

Details

The bandwidth is calculated to be optimal for a given performance criterion, as specified by opt.criterion. Alternatively, for local polynomial estimators, the bandwidth can be specified by h. For kern="optimal", calculate optimal estimators under second-order Taylor smoothness class (sharp RD only).

Value

Returns an object of class "RDResults". The function print can be used to obtain and print a summary of the results. An object of class "RDResults" is a list containing four components. First, a data frame "coefficients" containing the following columns:

term: type of parameter being estimated
estimate: point estimate
std.error: standard error of estimate
maximum.bias: maximum bias of estimate
conf.low, conf.high: lower (upper) end-point of a two-sided CI based on estimate
conf.low.onesided, conf.high.onesided: lower (upper) end-point of a one-sided CIs based on estimate
bandwidth: bandwidth used. If kern="optimal", the smoothing parameters bandwidth.m and bandwidth.p on either side of the cutoff are reported instead
eff.obs: number of effective observations
leverage: maximal leverage of estimate
cv: critical value used to compute two-sided CIs
alpha: coverage level, as specified by option alpha
method: sclass is used
M: curvature bound used for worst-case bias calculations. For fuzzy RD, equals (abs(estimate)*M.fs+M.rf)/first.stage
M.rf, M.fs: curvature bound for the outcome (i.e. reduced-form) and first-stage regressions. Fuzzy RD only.
first.stage: estimate of the first-stage coefficient. Fuzzy RD only.
kernel: kernel used
p.value: p-value for testing the null of no effect

Second, a list called "data" containing the data used for estimation. This is useful mostly for internal calculations. Third, an object of class "lm" containing the local linear regression estimates. Finally, a call object containing the matched call called "call".

If kern="optimal", the "lm" object is empty, and the numeric vectors "delta" and "omega" are returned in addition. These correspond to the parameters in the modulus problem used to compute the optimal estimation weights.

Note

subset is evaluated in the same way as variables in formula, that is first in data and then in the environment of formula.

References

Timothy B. Armstrong and Michal Kolesár. Optimal inference in a class of regression models. Econometrica, 86(2):655–683, March 2018. doi:10.3982/ECTA14434

Timothy B. Armstrong and Michal Kolesár. Simple and honest confidence intervals in nonparametric regression. Quantitative Economics, 11(1):1–39, January 2020.

Michal Kolesár and Christoph Rothe. Inference in regression discontinuity designs with a discrete running variable. American Economic Review, 108(8):2277—-2304, August 2018. doi:10.1257/aer.20160945

Examples

RDHonest(voteshare ~ margin, data = lee08, kern = "uniform", M = 0.1, h = 10)
RDHonest(cn | retired ~ elig_year, data=rcp, cutoff=0, M=c(4, 0.4),
          kern="triangular", opt.criterion="MSE", T0=0, h=3)
RDHonest(voteshare ~ margin, data = lee08, subset = margin>0,
          kern = "uniform", M = 0.1, h = 10, point.inference=TRUE)

[Package RDHonest version 1.0.0 Index]