R: Local polynomial conditional density estimation

lpcde {lpcde}

R Documentation

Local polynomial conditional density estimation

Description

lpcde implements the local polynomial regression based conditional density (and derivatives). The estimator proposed in (Cattaneo et al. 2024). Robust bias-corrected inference methods, both pointwise (confidence intervals) and uniform (confidence bands), are also implemented.

Usage

lpcde(
  x_data,
  y_data,
  y_grid = NULL,
  x = NULL,
  bw = NULL,
  p = NULL,
  q = NULL,
  p_RBC = NULL,
  q_RBC = NULL,
  mu = NULL,
  nu = NULL,
  rbc = TRUE,
  ng = NULL,
  normalize = FALSE,
  nonneg = FALSE,
  grid_spacing = "",
  kernel_type = c("epanechnikov", "triangular", "uniform"),
  bw_type = NULL
)

Arguments

`x_data`	Numeric matrix/data frame, the raw data of covariates.
`y_data`	Numeric matrix/data frame, the raw data of independent.
`y_grid`	Numeric, specifies the grid of evaluation points in the y-direction. When set to default, grid points will be chosen as 0.05-0.95 percentiles of the data, with a step size of 0.05 in y-direction.
`x`	Numeric, specifies the grid of evaluation points in the x-direction. When set to default, the evaluation point will be chosen as the median of the x data.
`bw`	Numeric, specifies the bandwidth used for estimation. Can be (1) a positive scalar (common bandwidth for all grid points); or (2) a positive numeric vector/matrix specifying bandwidths for each grid point (should be the same dimension as `grid`).
`p`	Nonnegative integer, specifies the order of the local polynomial for `Y` used to construct point estimates. (Default is `2`.)
`q`	Nonnegative integer, specifies the order of the local polynomial for `X` used to construct point estimates. (Default is `1`.)
`p_RBC`	Nonnegative integer, specifies the order of the local polynomial for `Y` used to construct bias-corrected point estimates. (Default is `p+1`.)
`q_RBC`	Nonnegative integer, specifies the order of the local polynomial for `X` used to construct bias-corrected point estimates. (Default is `q+1`.)
`mu`	Nonnegative integer, specifies the derivative with respect to `Y` of the distribution function to be estimated. `0` for the distribution function, `1` (default) for the density funtion, etc.
`nu`	Nonnegative integer, specifies the derivative with respect to `X` of the distribution function to be estimated. Default value is `0`.
`rbc`	Boolean. TRUE (default) for rbc calcuations, required for valid uniform inference.
`ng`	Int, number of grid points to be used. generates evenly space points over the support of the data.
`normalize`	Boolean, False (default) returns original estimator, True normalizes estimates to integrate to 1.
`nonneg`	Boolean, False (default) returns original estimator, True returns maximum of estimate and 0.
`grid_spacing`	String, If equal to "quantile" will generate quantile-spaced grid evaluation points, otherwise will generate equally spaced points.
`kernel_type`	String, specifies the kernel function, should be one of `"triangular"`, `"uniform"`, and `"epanechnikov"`(default).
`bw_type`	String, specifies the method for data-driven bandwidth selection. This option will be ignored if `bw` is provided. Implementable with `"mse-dpi"` (default, mean squared error-optimal bandwidth selected for each grid point)

Details

Bias correction is only used for the construction of confidence intervals/bands, but not for point estimation. The point estimates, denoted by est, are constructed using local polynomial estimates of order p and q, while the centering of the confidence intervals/bands, denoted by est_RBC, are constructed using local polynomial estimates of order p_RBC and q_RBC. The confidence intervals/bands take the form: [est_RBC - cv * SE(est_RBC) , est_RBC + cv * SE(est_RBC)], where cv denotes the appropriate critical value and SE(est_RBC) denotes an standard error estimate for the centering of the confidence interval/band. As a result, the confidence intervals/bands may not be centered at the point estimates because they have been bias-corrected. Setting p_RBC equal to p and q_RBC to q, results on centered at the point estimate confidence intervals/bands, but requires undersmoothing for valid inference (i.e., (I)MSE-optimal bandwdith for the density point estimator cannot be used). Hence the bandwidth would need to be specified manually when q=p, and the point estimates will not be (I)MSE optimal. See Cattaneo, Jansson and Ma (2020a, 2020b) for details, and also Calonico, Cattaneo, and Farrell (2018, 2020) for robust bias correction methods.

Sometimes the density point estimates may lie outside of the confidence intervals/bands, which can happen if the underlying distribution exhibits high curvature at some evaluation point(s). One possible solution in this case is to increase the polynomial order p or to employ a smaller bandwidth.

Value

`Estimate`	A matrix containing (1) `grid` (grid points), (2) `bw` (bandwidths), (3) `est` (point estimates with p-th and q-th order local polynomial), (4) `est_RBC` (point estimates with p_RBC-th and q_RBC-th order local polynomial), (5) `se` (standard error corresponding to `est`). (6) `se_RBC` (standard error corresponding to `est_RBC`).
`CovMat`	The variance-covariance matrix corresponding to `est`.
`opt`	A list containing options passed to the function.

Author(s)

Matias D. Cattaneo, Princeton University. cattaneo@princeton.edu.

Rajita Chandak (maintainer), Princeton University. rchandak@princeton.edu.

Michael Jansson, University of California Berkeley. mjansson@econ.berkeley.edu.

Xinwei Ma, University of California San Diego. x1ma@ucsd.edu.

References

Cattaneo MD, Chandak R, Jansson M, Ma X (2024). “Local Polynomial Conditional Density Estimators.” Bernoulli.
Calonico S, Cattaneo MD, Farrell MH (2018). “On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference.” Journal of the American Statistical Association, 113(522), 767–779.
Calonico S, Cattaneo MD, Farrell MH (2022). “Coverage Error Optimal Confidence Intervals for Local Polynomial Regression.” Bernoulli, 28(4), 2998–3022.
Cattaneo MD, Jansson M, Ma X (2020). “Simple local polynomial density estimators.” J. Amer. Statist. Assoc., 115(531), 1449–1455.

Examples

#Density estimation example
n=500
x_data = matrix(rnorm(n, mean=0, sd=1))
y_data = matrix(rnorm(n, mean=x_data, sd=1))
y_grid = seq(from=-1, to=1, length.out=5)
model1 = lpcde::lpcde(x_data=x_data, y_data=y_data, y_grid=y_grid, x=0, bw=0.5)
#summary of estimation
summary(model1)

[Package lpcde version 0.1.4 Index]