| binsglm {binsreg} | R Documentation | 
Data-Driven Binscatter Generalized Linear Regression with Robust Inference Procedures and Plots
Description
binsglm implements binscatter generalized linear regression with robust inference procedures and plots, following the
results in Cattaneo, Crump, Farrell and Feng (2024a) and
Cattaneo, Crump, Farrell and Feng (2024b).
Binscatter provides a flexible way to describe the relationship between two variables, after
possibly adjusting for other covariates, based on partitioning/binning of the independent variable of interest.
The main purpose of this function is to generate binned scatter plots with curve estimation with robust pointwise confidence intervals and
uniform confidence band. If the binning scheme is not set by the user, the companion function
binsregselect is used to implement binscatter in a data-driven way. Hypothesis testing about the function of interest can be conducted via the companion
function binstest.
Usage
binsglm(y, x, w = NULL, data = NULL, at = NULL, family = gaussian(),
  deriv = 0, nolink = F, dots = NULL, dotsgrid = 0, dotsgridmean = T,
  line = NULL, linegrid = 20, ci = NULL, cigrid = 0, cigridmean = T,
  cb = NULL, cbgrid = 20, polyreg = NULL, polyreggrid = 20,
  polyregcigrid = 0, by = NULL, bycolors = NULL, bysymbols = NULL,
  bylpatterns = NULL, legendTitle = NULL, legendoff = F, nbins = NULL,
  binspos = "qs", binsmethod = "dpi", nbinsrot = NULL, pselect = NULL,
  sselect = NULL, samebinsby = F, randcut = NULL, nsims = 500,
  simsgrid = 20, simsseed = NULL, vce = "HC1", cluster = NULL,
  asyvar = F, level = 95, noplot = F, dfcheck = c(20, 30),
  masspoints = "on", weights = NULL, subset = NULL, plotxrange = NULL,
  plotyrange = NULL, ...)
Arguments
| y | outcome variable. A vector. | 
| x | independent variable of interest. A vector. | 
| w | control variables. A matrix, a vector or a  | 
| data | an optional data frame containing variables in the model. | 
| at | value of  | 
| family | a description of the error distribution and link function to be used in the generalized linear model. (See  | 
| deriv | derivative order of the regression function for estimation, testing and plotting.
The default is  | 
| nolink | if true, the function within the inverse link function is reported instead of the conditional mean function for the outcome. | 
| dots | a vector or a logical value. If  | 
| dotsgrid | number of dots within each bin to be plotted. Given the choice, these dots are point estimates
evaluated over an evenly-spaced grid within each bin. The default is  | 
| dotsgridmean | If true, the dots corresponding to the point estimates evaluated at the mean of  | 
| line | a vector or a logical value. If  | 
| linegrid | number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the point estimate set by the  | 
| ci | a vector or a logical value. If  | 
| cigrid | number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the point
estimate set by the  | 
| cigridmean | If true, the confidence intervals corresponding to the point estimates evaluated at the mean of  | 
| cb | a vector or a logical value. If  | 
| cbgrid | number of evaluation points of an evenly-spaced grid within each bin used for evaluation of the point
estimate set by the  | 
| polyreg | degree of a global polynomial regression model for plotting. By default, this fit is not included
in the plot unless explicitly specified. Recommended specification is  | 
| polyreggrid | number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the point estimate set by the  | 
| polyregcigrid | number of evaluation points of an evenly-spaced grid within each bin used for constructing
confidence intervals based on polynomial regression set by the  | 
| by | a vector containing the group indicator for subgroup analysis; both numeric and string variables
are supported. When  | 
| bycolors | an ordered list of colors for plotting each subgroup series defined by the option  | 
| bysymbols | an ordered list of symbols for plotting each subgroup series defined by the option  | 
| bylpatterns | an ordered list of line patterns for plotting each subgroup series defined by the option  | 
| legendTitle | String, title of legend. | 
| legendoff | If true, no legend is added. | 
| nbins | number of bins for partitioning/binning of  | 
| binspos | position of binning knots. The default is  | 
| binsmethod | method for data-driven selection of the number of bins. The default is  | 
| nbinsrot | initial number of bins value used to construct the DPI number of bins selector. If not specified, the data-driven ROT selector is used instead. | 
| pselect | vector of numbers within which the degree of polynomial  | 
| sselect | vector of numbers within which the number of smoothness constraints  | 
| samebinsby | if true, a common partitioning/binning structure across all subgroups specified by the option  | 
| randcut | upper bound on a uniformly distributed variable used to draw a subsample for bins/degree/smoothness selection.
Observations for which  | 
| nsims | number of random draws for constructing confidence bands. The default is
 | 
| simsgrid | number of evaluation points of an evenly-spaced grid within each bin used for evaluation of
the supremum operation needed to construct confidence bands. The default is  | 
| simsseed | seed for simulation. | 
| vce | Procedure to compute the variance-covariance matrix estimator. Options are 
 | 
| cluster | cluster ID. Used for compute cluster-robust standard errors. | 
| asyvar | if true, the standard error of the nonparametric component is computed and the uncertainty related to control
variables is omitted. Default is  | 
| level | nominal confidence level for confidence interval and confidence band estimation. Default is  | 
| noplot | if true, no plot produced. | 
| dfcheck | adjustments for minimum effective sample size checks, which take into account number of unique
values of  | 
| masspoints | how mass points in  
 | 
| weights | an optional vector of weights to be used in the fitting process. Should be  | 
| subset | optional rule specifying a subset of observations to be used. | 
| plotxrange | a vector.  | 
| plotyrange | a vector.  | 
| ... | optional arguments used by  | 
Value
| bins_plot | A  | 
| data.plot | A list containing data for plotting. Each item is a sublist of data frames for each group. Each sublist may contain the following data frames: 
 | 
| imse.var.rot | Variance constant in IMSE, ROT selection. | 
| imse.bsq.rot | Bias constant in IMSE, ROT selection. | 
| imse.var.dpi | Variance constant in IMSE, DPI selection. | 
| imse.bsq.dpi | Bias constant in IMSE, DPI selection. | 
| cval.by | A vector of critical values for constructing confidence band for each group. | 
| opt |  A list containing options passed to the function, as well as  | 
Author(s)
Matias D. Cattaneo, Princeton University, Princeton, NJ. cattaneo@princeton.edu.
Richard K. Crump, Federal Reserve Bank of New York, New York, NY. richard.crump@ny.frb.org.
Max H. Farrell, UC Santa Barbara, Santa Barbara, CA. mhfarrell@gmail.com.
Yingjie Feng (maintainer), Tsinghua University, Beijing, China. fengyingjiepku@gmail.com.
References
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024a: On Binscatter. American Economic Review 114(5): 1488-1514.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024b: Nonlinear Binscatter Methods. Working Paper.
Cattaneo, M. D., R. K. Crump, M. H. Farrell, and Y. Feng. 2024c: Binscatter Regressions. Working Paper.
See Also
Examples
 x <- runif(500); d <- 1*(runif(500)<=x)
 ## Binned scatterplot
 binsglm(d, x, family=binomial())