R: Data-Driven IMSE-Optimal Partitioning/Binning Selection for...

binsregselect {binsreg}

R Documentation

Data-Driven IMSE-Optimal Partitioning/Binning Selection for Binscatter

Description

binsregselect implements data-driven procedures for selecting the number of bins for binscatter estimation. The selected number is optimal in minimizing integrated mean squared error (IMSE).

Usage

binsregselect(y, x, w = NULL, data = NULL, deriv = 0, bins = NULL,
  pselect = NULL, sselect = NULL, binspos = "qs", nbins = NULL,
  binsmethod = "dpi", nbinsrot = NULL, simsgrid = 20, savegrid = F,
  vce = "HC1", useeffn = NULL, randcut = NULL, cluster = NULL,
  dfcheck = c(20, 30), masspoints = "on", weights = NULL,
  subset = NULL, norotnorm = F, numdist = NULL, numclust = NULL)

Arguments

`y`	outcome variable. A vector.
`x`	independent variable of interest. A vector.
`w`	control variables. A matrix, a vector or a `formula`.
`data`	an optional data frame containing variables used in the model.
`deriv`	derivative order of the regression function for estimation, testing and plotting. The default is `deriv=0`, which corresponds to the function itself.
`bins`	a vector. `bins=c(p,s)` set a piecewise polynomial of degree `p` with `s` smoothness constraints for data-driven (IMSE-optimal) selection of the partitioning/binning scheme. By default, the function sets `bins=c(0,0)`, which corresponds to piecewise constant (canonical binscatter).
`pselect`	vector of numbers within which the degree of polynomial `p` for point estimation is selected. Note: To implement the degree or smoothness selection, in addition to `pselect` or `sselect`, `nbins=#` must be specified.
`sselect`	vector of numbers within which the number of smoothness constraints `s` for point estimation is selected. If not specified, for each value `p` supplied in the option `pselect`, only the piecewise polynomial with the maximum smoothness is considered, i.e., `s=p`.
`binspos`	position of binning knots. The default is `binspos="qs"`, which corresponds to quantile-spaced binning (canonical binscatter). The other option is `binspos="es"` for evenly-spaced binning.
`nbins`	number of bins for degree/smoothness selection. If `nbins=T` or `nbins=NULL` (default) is specified, the function selects the number of bins instead, given the specified degree and smoothness. If a vector with more than one number is specified, the command selects the number of bins within this vector.
`binsmethod`	method for data-driven selection of the number of bins. The default is `binsmethod="dpi"`, which corresponds to the IMSE-optimal direct plug-in rule. The other option is: `"rot"` for rule of thumb implementation.
`nbinsrot`	initial number of bins value used to construct the DPI number of bins selector. If not specified, the data-driven ROT selector is used instead.
`simsgrid`	number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the supremum (infimum or Lp metric) operation needed to construct confidence bands and hypothesis testing procedures. The default is `simsgrid=20`, which corresponds to 20 evenly-spaced evaluation points within each bin for approximating the supremum (infimum or Lp metric) operator.
`savegrid`	if true, a data frame produced containing grid.
`vce`	procedure to compute the variance-covariance matrix estimator. Options are `"const"` homoskedastic variance estimator. `"HC0"` heteroskedasticity-robust plug-in residuals variance estimator without weights. `"HC1"` heteroskedasticity-robust plug-in residuals variance estimator with hc1 weights. Default. `"HC2"` heteroskedasticity-robust plug-in residuals variance estimator with hc2 weights. `"HC3"` heteroskedasticity-robust plug-in residuals variance estimator with hc3 weights.
`useeffn`	effective sample size to be used when computing the (IMSE-optimal) number of bins. This option is useful for extrapolating the optimal number of bins to larger (or smaller) datasets than the one used to compute it.
`randcut`	upper bound on a uniformly distributed variable used to draw a subsample for bins/degree/smoothness selection. Observations for which `runif()<=#` are used. # must be between 0 and 1.
`cluster`	cluster ID. Used for compute cluster-robust standard errors.
`dfcheck`	adjustments for minimum effective sample size checks, which take into account number of unique values of `x` (i.e., number of mass points), number of clusters, and degrees of freedom of the different statistical models considered. The default is `dfcheck=c(20, 30)`. See Cattaneo, Crump, Farrell and Feng (2024c) for more details.
`masspoints`	how mass points in `x` are handled. Available options: `"on"` all mass point and degrees of freedom checks are implemented. Default. `"noadjust"` mass point checks and the corresponding effective sample size adjustments are omitted. `"nolocalcheck"` within-bin mass point and degrees of freedom checks are omitted. `"off"` "noadjust" and "nolocalcheck" are set simultaneously. `"veryfew"` forces the function to proceed as if `x` has only a few number of mass points (i.e., distinct values). In other words, forces the function to proceed as if the mass point and degrees of freedom checks were failed.
`weights`	an optional vector of weights to be used in the fitting process. Should be `NULL` or a numeric vector. For more details, see `lm`.
`subset`	optional rule specifying a subset of observations to be used.
`norotnorm`	if true, a uniform density rather than normal density used for ROT selection.
`numdist`	number of distinct values for selection. Used to speed up computation.
`numclust`	number of clusters for selection. Used to speed up computation.

Value

`nbinsrot.poly`	ROT number of bins, unregularized.
`nbinsrot.regul`	ROT number of bins, regularized.
`nbinsrot.uknot`	ROT number of bins, unique knots.
`nbinsdpi`	DPI number of bins.
`nbinsdpi.uknot`	DPI number of bins, unique knots.
`prot.poly`	ROT degree of polynomials, unregularized.
`prot.regul`	ROT degree of polynomials, regularized.
`prot.uknot`	ROT degree of polynomials, unique knots.
`pdpi`	DPI degree of polynomials.
`pdpi.uknot`	DPI degree of polynomials, unique knots.
`srot.poly`	ROT number of smoothness constraints, unregularized.
`srot.regul`	ROT number of smoothness constraints, regularized.
`srot.uknot`	ROT number of smoothness constraints, unique knots.
`sdpi`	DPI number of smoothness constraints.
`sdpi.uknot`	DPI number of smoothness constraints, unique knots.
`imse.var.rot`	Variance constant in IMSE expansion, ROT selection.
`imse.bsq.rot`	Bias constant in IMSE expansion, ROT selection.
`imse.var.dpi`	Variance constant in IMSE expansion, DPI selection.
`imse.bsq.dpi`	Bias constant in IMSE expansion, DPI selection.
`int.result`	Intermediate results, including a matrix of degree and smoothness (`deg_mat`), the selected numbers of bins (`vec.nbinsrot.poly`,`vec.nbinsrot.regul`, `vec.nbinsrot.uknot`, `vec.nbinsdpi`, `vec.nbinsdpi.uknot`), and the bias and variance constants in IMSE (`vec.imse.b.rot`, `vec.imse.v.rot`, `vec.imse.b.dpi`, `vec.imse.v.dpi`) under each rule (ROT or DPI), corresponding to each pair of degree and smoothness (each row in `deg_mat`).
`opt`	A list containing options passed to the function, as well as total sample size `n`, number of distinct values `Ndist` in `x`, and number of clusters `Nclust`.
`data.grid`	A data frame containing grid.

Author(s)

Matias D. Cattaneo, Princeton University, Princeton, NJ. cattaneo@princeton.edu.

Richard K. Crump, Federal Reserve Bank of New York, New York, NY. richard.crump@ny.frb.org.

Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. mhfarrell@gmail.com.

Yingjie Feng (maintainer), Tsinghua University, Beijing, China. fengyingjiepku@gmail.com.

References

Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.

Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.

Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.

Examples

 x <- runif(500); y <- sin(x)+rnorm(500)
 est <- binsregselect(y,x)
 summary(est)

[Package binsreg version 1.1 Index]