R: Data-Driven Nonparametric Shape Restriction and Parametric...

binstest {binsreg}

R Documentation

Data-Driven Nonparametric Shape Restriction and Parametric Model Specification Testing using Binscatter

Description

binstest implements binscatter-based hypothesis testing procedures for parametric functional forms of and nonparametric shape restrictions on the regression function of interest, following the results in Cattaneo, Crump, Farrell and Feng (2024a) and Cattaneo, Crump, Farrell and Feng (2024b). If the binning scheme is not set by the user, the companion function binsregselect is used to implement binscatter in a data-driven way and inference procedures are based on robust bias correction. Binned scatter plots based on different methods can be constructed using the companion functions binsreg, binsqreg or binsglm.

Usage

binstest(y, x, w = NULL, data = NULL, estmethod = "reg",
  family = gaussian(), quantile = NULL, deriv = 0, at = NULL,
  nolink = F, testmodel = NULL, testmodelparfit = NULL,
  testmodelpoly = NULL, testshape = NULL, testshapel = NULL,
  testshaper = NULL, testshape2 = NULL, lp = Inf, bins = NULL,
  nbins = NULL, pselect = NULL, sselect = NULL, binspos = "qs",
  binsmethod = "dpi", nbinsrot = NULL, randcut = NULL, nsims = 500,
  simsgrid = 20, simsseed = NULL, vce = NULL, cluster = NULL,
  asyvar = F, dfcheck = c(20, 30), masspoints = "on", weights = NULL,
  subset = NULL, numdist = NULL, numclust = NULL, estmethodopt = NULL,
  ...)

Arguments

`y`	outcome variable. A vector.
`x`	independent variable of interest. A vector.
`w`	control variables. A matrix, a vector or a `formula`.
`data`	an optional data frame containing variables used in the model.
`estmethod`	estimation method. The default is `estmethod="reg"` for tests based on binscatter least squares regression. Other options are `"qreg"` for quantile regression and `"glm"` for generalized linear regression. If `estmethod="glm"`, the option `family` must be specified.
`family`	a description of the error distribution and link function to be used in the generalized linear model when `estmethod="glm"`. (See `family` for details of family functions.)
`quantile`	the quantile to be estimated. A number strictly between 0 and 1.
`deriv`	derivative order of the regression function for estimation, testing and plotting. The default is `deriv=0`, which corresponds to the function itself.
`at`	value of `w` at which the estimated function is evaluated. The default is `at="mean"`, which corresponds to the mean of `w`. Other options are: `at="median"` for the median of `w`, `at="zero"` for a vector of zeros. `at` can also be a vector of the same length as the number of columns of `w` (if `w` is a matrix) or a data frame containing the same variables as specified in `w` (when `data` is specified). Note that when `at="mean"` or `at="median"`, all factor variables (if specified) are excluded from the evaluation (set as zero).
`nolink`	if true, the function within the inverse link function is reported instead of the conditional mean function for the outcome.
`testmodel`	a vector or a logical value. It sets the degree of polynomial and the number of smoothness constraints for parametric model specification testing. If `testmodel=c(p,s)` is specified, a piecewise polynomial of degree `p` with `s` smoothness constraints is used. If `testmodel=T` or `testmodel=NULL` (default) is specified, `testmodel=c(1,1)` is used unless the degree `p` or the smoothness `s` selection is requested via the option `pselect` or `sselect` (see more details in the explanation of `pselect` and `sselect`).
`testmodelparfit`	a data frame or matrix which contains the evaluation grid and fitted values of the model(s) to be tested against. The column contains a series of evaluation points at which the binscatter model and the parametric model of interest are compared with each other. Each parametric model is represented by other columns, which must contain the fitted values at the corresponding evaluation points.
`testmodelpoly`	degree of a global polynomial model to be tested against.
`testshape`	a vector or a logical value. It sets the degree of polynomial and the number of smoothness constraints for nonparametric shape restriction testing. If `testshape=c(p,s)` is specified, a piecewise polynomial of degree `p` with `s` smoothness constraints is used. If `testshape=T` or `testshape=NULL` (default) is specified, `testshape=c(1,1)` is used unless the degree `p` or smoothness `s` selection is requested via the option `pselect` or `sselect` (see more details in the explanation of `pselect` and `sselect`).
`testshapel`	a vector of null boundary values for hypothesis testing. Each number `a` in the vector corresponds to one boundary of a one-sided hypothesis test to the left of the form `H0: sup_x mu(x)<=a`.
`testshaper`	a vector of null boundary values for hypothesis testing. Each number `a` in the vector corresponds to one boundary of a one-sided hypothesis test to the right of the form `H0: inf_x mu(x)>=a`.
`testshape2`	a vector of null boundary values for hypothesis testing. Each number `a` in the vector corresponds to one boundary of a two-sided hypothesis test of the form `H0: sup_x \|mu(x)-a\|=0`.
`lp`	an Lp metric used for parametric model specification testing and/or shape restriction testing. The default is `lp=Inf`, which corresponds to the sup-norm of the t-statistic. Other options are `lp=q` for a positive number `q>=1`. Note that `lp=Inf` ("sup-norm") has to be used for testing one-sided shape restrictions.
`bins`	a vector. If `bins=c(p,s)`, it sets the piecewise polynomial of degree `p` with `s` smoothness constraints for data-driven (IMSE-optimal) selection of the partitioning/binning scheme. The default is `bins=c(0,0)`, which corresponds to the piecewise constant.
`nbins`	number of bins for partitioning/binning of `x`. If `nbins=T` or `nbins=NULL` (default) is specified, the number of bins is selected via the companion command `binsregselect` in a data-driven, optimal way whenever possible. If a vector with more than one number is specified, the number of bins is selected within this vector via the companion command `binsregselect`.
`pselect`	vector of numbers within which the degree of polynomial `p` for point estimation is selected. If the selected optimal degree is `p`, then piecewise polynomials of degree `p+1` are used to conduct testing for nonparametric shape restrictions or parametric model specifications. Note: To implement the degree or smoothness selection, in addition to `pselect` or `sselect`, `nbins=#` must be specified.
`sselect`	vector of numbers within which the number of smoothness constraints `s` for point estimation is selected. If the selected optimal smoothness is `s`, then piecewise polynomials of `s+1` smoothness constraints are used to conduct testing for nonparametric shape restrictions or parametric model specifications. If not specified, for each value `p` supplied in the option `pselect`, only the piecewise polynomial with the maximum smoothness is considered, i.e., `s=p`.
`binspos`	position of binning knots. The default is `binspos="qs"`, which corresponds to quantile-spaced binning (canonical binscatter). The other options are `"es"` for evenly-spaced binning, or a vector for manual specification of the positions of inner knots (which must be within the range of `x`).
`binsmethod`	method for data-driven selection of the number of bins. The default is `binsmethod="dpi"`, which corresponds to the IMSE-optimal direct plug-in rule. The other option is: `"rot"` for rule of thumb implementation.
`nbinsrot`	initial number of bins value used to construct the DPI number of bins selector. If not specified, the data-driven ROT selector is used instead.
`randcut`	upper bound on a uniformly distributed variable used to draw a subsample for bins/degree/smoothness selection. Observations for which `runif()<=#` are used. # must be between 0 and 1. By default, `max(5000, 0.01n)` observations are used if the samples size `n>5000`.
`nsims`	number of random draws for hypothesis testing. The default is `nsims=500`, which corresponds to 500 draws from a standard Gaussian random vector of size `[(p+1)J - (J-1)s]`. Setting at least `nsims=2000` is recommended to obtain the final results.
`simsgrid`	number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the supremum (infimum or Lp metric) operation needed to construct hypothesis testing procedures. The default is `simsgrid=20`, which corresponds to 20 evenly-spaced evaluation points within each bin for approximating the supremum (infimum or Lp metric) operator. Setting at least `simsgrid=50` is recommended to obtain the final results.
`simsseed`	seed for simulation.
`vce`	procedure to compute the variance-covariance matrix estimator. For least squares regression and generalized linear regression, the allowed options are the same as that for `binsreg` or `binsqreg`. For quantile regression, the allowed options are the same as that for `binsqreg`.
`cluster`	cluster ID. Used for compute cluster-robust standard errors.
`asyvar`	if true, the standard error of the nonparametric component is computed and the uncertainty related to control variables is omitted. Default is `asyvar=FALSE`, that is, the uncertainty related to control variables is taken into account.
`dfcheck`	adjustments for minimum effective sample size checks, which take into account number of unique values of `x` (i.e., number of mass points), number of clusters, and degrees of freedom of the different stat models considered. The default is `dfcheck=c(20, 30)`. See Cattaneo, Crump, Farrell and Feng (2024c) for more details.
`masspoints`	how mass points in `x` are handled. Available options: `"on"` all mass point and degrees of freedom checks are implemented. Default. `"noadjust"` mass point checks and the corresponding effective sample size adjustments are omitted. `"nolocalcheck"` within-bin mass point and degrees of freedom checks are omitted. `"off"` "noadjust" and "nolocalcheck" are set simultaneously. `"veryfew"` forces the function to proceed as if `x` has only a few number of mass points (i.e., distinct values). In other words, forces the function to proceed as if the mass point and degrees of freedom checks were failed.
`weights`	an optional vector of weights to be used in the fitting process. Should be `NULL` or a numeric vector. For more details, see `lm`.
`subset`	optional rule specifying a subset of observations to be used.
`numdist`	number of distinct values for selection. Used to speed up computation.
`numclust`	number of clusters for selection. Used to speed up computation.
`estmethodopt`	a list of optional arguments used by `rq` (for quantile regression) or `glm` (for fitting generalized linear models).
`...`	optional arguments to control bootstrapping if `estmethod="qreg"` and `vce="boot"`. See `boot.rq`.

Value

`testshapeL`	Results for `testshapel`, including: `testvalL`, null boundary values; `stat.shapeL`, test statistics; and `pval.shapeL`, p-value.
`testshapeR`	Results for `testshaper`, including: `testvalR`, null boundary values; `stat.shapeR`, test statistics; and `pval.shapeR`, p-value.
`testshape2`	Results for `testshape2`, including: `testval2`, null boundary values; `stat.shape2`, test statistics; and `pval.shape2`, p-value.
`testpoly`	Results for `testmodelpoly`, including: `testpoly`, the degree of global polynomial; `stat.poly`, test statistic; `pval.poly`, p-value.
`testmodel`	Results for `testmodelparfit`, including: `stat.model`, test statistics; `pval.model`, p-values.
`imse.var.rot`	Variance constant in IMSE, ROT selection.
`imse.bsq.rot`	Bias constant in IMSE, ROT selection.
`imse.var.dpi`	Variance constant in IMSE, DPI selection.
`imse.bsq.dpi`	Bias constant in IMSE, DPI selection.
`opt`	A list containing options passed to the function, as well as total sample size `n`, number of distinct values `Ndist` in `x`, number of clusters `Nclust`, and number of bins `nbins`.

Author(s)

Matias D. Cattaneo, Princeton University, Princeton, NJ. cattaneo@princeton.edu.

Richard K. Crump, Federal Reserve Bank of New York, New York, NY. richard.crump@ny.frb.org.

Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. mhfarrell@gmail.com.

Yingjie Feng (maintainer), Tsinghua University, Beijing, China. fengyingjiepku@gmail.com.

References

Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.

Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.

Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.

Examples

 x <- runif(500); y <- sin(x)+rnorm(500)
 est <- binstest(y,x, testmodelpoly=1)
 summary(est)

[Package binsreg version 1.1 Index]