lpdensity {lpdensity} | R Documentation |
Local Polynomial Density Estimation and Inference
Description
lpdensity
implements the local polynomial regression based density (and derivatives)
estimator proposed in Cattaneo, Jansson and Ma (2020). Robust bias-corrected inference methods,
both pointwise (confidence intervals) and uniform (confidence bands), are also implemented
following the results in Cattaneo, Jansson and Ma (2020, 2023).
See Cattaneo, Jansson and Ma (2022) for more implementation details and illustrations.
Companion command: lpbwdensity
for bandwidth selection.
Related Stata
and R
packages useful for nonparametric estimation and inference are
available at https://nppackages.github.io/.
Usage
lpdensity(
data,
grid = NULL,
bw = NULL,
p = NULL,
q = NULL,
v = NULL,
kernel = c("triangular", "uniform", "epanechnikov"),
scale = NULL,
massPoints = TRUE,
bwselect = c("mse-dpi", "imse-dpi", "mse-rot", "imse-rot"),
stdVar = TRUE,
regularize = TRUE,
nLocalMin = NULL,
nUniqueMin = NULL,
Cweights = NULL,
Pweights = NULL
)
Arguments
data |
Numeric vector or one dimensional matrix/data frame, the raw data. |
grid |
Numeric, specifies the grid of evaluation points. When set to default, grid points will be chosen as 0.05-0.95 percentiles of the data, with a step size of 0.05. |
bw |
Numeric, specifies the bandwidth
used for estimation. Can be (1) a positive scalar (common
bandwidth for all grid points); or (2) a positive numeric vector specifying bandwidths for
each grid point (should be the same length as |
p |
Nonnegative integer, specifies the order of the local polynomial used to construct point
estimates. (Default is |
q |
Nonnegative integer, specifies the order of the local polynomial used to construct
confidence intervals/bands (a.k.a. the bias correction order). Default is |
v |
Nonnegative integer, specifies the derivative of the distribution function to be estimated. |
kernel |
String, specifies the kernel function, should be one of |
scale |
Numeric, specifies how
estimates are scaled. For example, setting this parameter to 0.5 will scale down both the
point estimates and standard errors by half. Default is |
massPoints |
|
bwselect |
String, specifies the method for data-driven bandwidth selection. This option will be
ignored if |
stdVar |
|
regularize |
|
nLocalMin |
Nonnegative integer, specifies the minimum number of observations in each local neighborhood. This option
will be ignored if |
nUniqueMin |
Nonnegative integer, specifies the minimum number of unique observations in each local neighborhood. This option
will be ignored if |
Cweights |
Numeric, specifies the weights used for counterfactual distribution construction. Should have the same length as the data. |
Pweights |
Numeric, specifies the weights used in sampling. Should have the same length as the data. |
Details
Bias correction is only used for the construction of confidence intervals/bands, but not for point
estimation. The point estimates, denoted by f_p
, are constructed using local polynomial estimates
of order p
, while the centering of the confidence intervals/bands, denoted by f_q
, are constructed
using local polynomial estimates of order q
. The confidence intervals/bands take the form:
[f_q - cv * SE(f_q) , f_q + cv * SE(f_q)]
, where cv
denotes the appropriate critical value and SE(f_q)
denotes an standard error estimate for the centering of the confidence interval/band. As a result,
the confidence intervals/bands may not be centered at the point estimates because they have been bias-corrected.
Setting q
and p
to be equal results on centered at the point estimate confidence intervals/bands,
but requires undersmoothing for valid inference (i.e., (I)MSE-optimal bandwdith for the density point estimator
cannot be used). Hence the bandwidth would need to be specified manually when q=p
, and the
point estimates will not be (I)MSE optimal. See Cattaneo, Jansson and Ma (2020, 2023) for details, and also
Calonico, Cattaneo, and Farrell (2018, 2022) for robust bias correction methods.
Sometimes the density point estimates may lie outside of the confidence intervals/bands, which can happen
if the underlying distribution exhibits high curvature at some evaluation point(s). One possible solution
in this case is to increase the polynomial order p
or to employ a smaller bandwidth.
Value
Estimate |
A matrix containing (1) |
CovMat_p |
The variance-covariance matrix corresponding to |
CovMat_q |
The variance-covariance matrix corresponding to |
opt |
A list containing options passed to the function. |
Author(s)
Matias D. Cattaneo, Princeton University. cattaneo@princeton.edu.
Michael Jansson, University of California Berkeley. mjansson@econ.berkeley.edu.
Xinwei Ma (maintainer), University of California San Diego. x1ma@ucsd.edu.
References
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference. Journal of the American Statistical Association, 113(522): 767-779. doi:10.1080/01621459.2017.1285776
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2022. Coverage Error Optimal Confidence Intervals for Local Polynomial Regression. Bernoulli, 28(4): 2998-3022. doi:10.3150/21-BEJ1445
Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531): 1449-1455. doi:10.1080/01621459.2019.1635480
Cattaneo, M. D., M. Jansson, and X. Ma. 2022. lpdensity: Local Polynomial Density Estimation and Inference. Journal of Statistical Software, 101(2), 1–25. doi:10.18637/jss.v101.i02
Cattaneo, M. D., M. Jansson, and X. Ma. 2023. Local Regression Distribution Estimators. Journal of Econometrics, forthcoming. doi:10.1016/j.jeconom.2021.01.006
See Also
Supported methods: coef.lpdensity
, confint.lpdensity
, plot.lpdensity
, print.lpdensity
, summary.lpdensity
, vcov.lpdensity
.
Examples
# Generate a random sample
set.seed(42); X <- rnorm(2000)
# Estimate density and report results
est1 <- lpdensity(data = X, bwselect = "imse-dpi")
summary(est1)
# Report results for a subset of grid points
summary(est1, grid=est1$Estimate[4:10, "grid"])
summary(est1, gridIndex=4:10)
# Report the 99% uniform confidence band
set.seed(42) # fix the seed for simulating critical values
summary(est1, alpha=0.01, CIuniform=TRUE)
# Plot the estimates and confidence intervals
plot(est1, legendTitle="My Plot", legendGroups=c("X"))
# Plot the estimates and the 99% uniform confidence band
set.seed(42) # fix the seed for simulating critical values
plot(est1, alpha=0.01, CIuniform=TRUE, legendTitle="My Plot", legendGroups=c("X"))
# Adding a histogram to the background
plot(est1, legendTitle="My Plot", legendGroups=c("X"),
hist=TRUE, histData=X, histBreaks=seq(-1.5, 1.5, 0.25))