npqreg {np} | R Documentation |
Kernel Quantile Regression with Mixed Data Types
Description
npqreg
computes a kernel quantile regression estimate of a one
(1) dimensional dependent variable on p
-variate explanatory
data, given a set of evaluation points, training points (consisting of
explanatory data and dependent data), and a bandwidth specification
using the methods of Li and Racine (2008) and Li, Lin and Racine
(2013). A bandwidth specification can be a condbandwidth
object,
or a bandwidth vector, bandwidth type and kernel type.
Usage
npqreg(bws, ...)
## S3 method for class 'formula'
npqreg(bws, data = NULL, newdata = NULL, ...)
## S3 method for class 'call'
npqreg(bws, ...)
## S3 method for class 'condbandwidth'
npqreg(bws,
txdat = stop("training data 'txdat' missing"),
tydat = stop("training data 'tydat' missing"),
exdat,
tau = 0.5,
gradients = FALSE,
ftol = 1.490116e-07,
tol = 1.490116e-04,
small = 1.490116e-05,
itmax = 10000,
lbc.dir = 0.5,
dfc.dir = 3,
cfac.dir = 2.5*(3.0-sqrt(5)),
initc.dir = 1.0,
lbd.dir = 0.1,
hbd.dir = 1,
dfac.dir = 0.25*(3.0-sqrt(5)),
initd.dir = 1.0,
...)
## Default S3 method:
npqreg(bws, txdat, tydat, ...)
Arguments
bws |
a bandwidth specification. This can be set as a |
tau |
a numeric value specifying the |
... |
additional arguments supplied to specify the regression type,
bandwidth type, kernel types, training data, and so on.
To do this,
you may specify any of |
data |
an optional data frame, list or environment (or object
coercible to a data frame by |
newdata |
An optional data frame in which to look for evaluation data. If omitted, the training data are used. |
txdat |
a |
tydat |
a one (1) dimensional numeric or integer vector of dependent data, each
element |
exdat |
a |
gradients |
[currently not supported] a logical value indicating that you want
gradients computed and returned in the resulting |
itmax |
integer number of iterations before failure in the numerical
optimization routine. Defaults to |
ftol |
fractional tolerance on the value of the cross-validation function
evaluated at located minima (of order the machine precision or
perhaps slightly larger so as not to be diddled by
roundoff). Defaults to |
tol |
tolerance on the position of located minima of the cross-validation
function (tol should generally be no smaller than the square root of
your machine's floating point precision). Defaults to |
small |
a small number used to bracket a minimum (it is hopeless to ask for
a bracketing interval of width less than sqrt(epsilon) times its
central value, a fractional width of only about 10-04 (single
precision) or 3x10-8 (double precision)). Defaults to |
lbc.dir , dfc.dir , cfac.dir , initc.dir |
lower bound, chi-square
degrees of freedom, stretch factor, and initial non-random values
for direction set search for Powell's algorithm for |
lbd.dir , hbd.dir , dfac.dir , initd.dir |
lower bound, upper bound, stretch factor, and initial non-random values for direction set search for Powell's algorithm for categorical variables. See Details |
Details
The optimizer invoked for search is Powell's conjugate direction
method which requires the setting of (non-random) initial values and
search directions for bandwidths, and, when restarting, random values
for successive invocations. Bandwidths for numeric
variables
are scaled by robust measures of spread, the sample size, and the
number of numeric
variables where appropriate. Two sets of
parameters for bandwidths for numeric
can be modified, those
for initial values for the parameters themselves, and those for the
directions taken (Powell's algorithm does not involve explicit
computation of the function's gradient). The default values are set by
considering search performance for a variety of difficult test cases
and simulated cases. We highly recommend restarting search a large
number of times to avoid the presence of local minima (achieved by
modifying nmulti
). Further refinement for difficult cases can
be achieved by modifying these sets of parameters. However, these
parameters are intended more for the authors of the package to enable
‘tuning’ for various methods rather than for the user themselves.
Value
npqreg
returns a npqregression
object. The generic
functions fitted
(or quantile
),
se
, predict
(when using
predict
you must add the argument tau=
to
generate predictions other than the median), and
gradients
, extract (or generate) estimated values,
asymptotic standard errors on estimates, predictions, and gradients,
respectively, from the returned object. Furthermore, the functions
summary
and plot
support objects of this
type. The returned object has the following components:
eval |
evaluation points |
quantile |
estimation of the quantile regression function (conditional quantile) at the evaluation points |
quanterr |
standard errors of the quantile regression estimates |
quantgrad |
gradients at each evaluation point |
tau |
the |
Usage Issues
If you are using data of mixed types, then it is advisable to use the
data.frame
function to construct your input data and not
cbind
, since cbind
will typically not work as
intended on mixed data types and will coerce the data to the same
type.
Author(s)
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
References
Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.
Hall, P. and J.S. Racine and Q. Li (2004), “Cross-validation and the estimation of conditional probability densities,” Journal of the American Statistical Association, 99, 1015-1026.
Koenker, R. W. and G.W. Bassett (1978), “Regression quantiles,” Econometrica, 46, 33-50.
Koenker, R. (2005), Quantile Regression, Econometric Society Monograph Series, Cambridge University Press.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Li, Q. and J.S. Racine (2008), “Nonparametric estimation of conditional CDF and quantile functions with mixed categorical and continuous data,” Journal of Business and Economic Statistics, 26, 423-434.
Li, Q. and J. Lin and J.S. Racine (2013), “Optimal Bandwidth Selection for Nonparametric Conditional Distribution and Quantile Functions”, Journal of Business and Economic Statistics, 31, 57-65.
Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.
See Also
quantreg
Examples
## Not run:
# EXAMPLE 1 (INTERFACE=FORMULA): For this example, we compute a
# bivariate nonparametric quantile regression estimate for Giovanni
# Baiocchi's Italian income panel (see Italy for details)
data("Italy")
attach(Italy)
# First, compute the cross-validated bandwidths. Note - this may take a
# few minutes depending on the speed of your computer...
bw <- npcdistbw(formula=gdp~ordered(year))
# Note - numerical search for computing the quantiles may take a minute
# or so...
model.q0.25 <- npqreg(bws=bw, tau=0.25)
model.q0.50 <- npqreg(bws=bw, tau=0.50)
model.q0.75 <- npqreg(bws=bw, tau=0.75)
# Plot the resulting quantiles manually...
plot(ordered(year), gdp,
main="CDF Quantile Estimates for the Italian Income Panel",
xlab="Year",
ylab="GDP Quantiles")
lines(ordered(year), model.q0.25$quantile, col="red", lty=2)
lines(ordered(year), model.q0.50$quantile, col="blue", lty=3)
lines(ordered(year), model.q0.75$quantile, col="red", lty=2)
legend(ordered(1951), 32, c("tau = 0.25", "tau = 0.50", "tau = 0.75"),
lty=c(2, 3, 2), col=c("red", "blue", "red"))
detach(Italy)
# EXAMPLE 1 (INTERFACE=DATA FRAME): For this example, we compute a
# bivariate nonparametric quantile regression estimate for Giovanni
# Baiocchi's Italian income panel (see Italy for details)
data("Italy")
attach(Italy)
data <- data.frame(ordered(year), gdp)
# First, compute the likelihood cross-validation bandwidths (default).
# Note - this may take a few minutes depending on the speed of your
# computer...
bw <- npcdistbw(xdat=ordered(year), ydat=gdp)
# Note - numerical search for computing the quantiles will take a
# minute or so...
model.q0.25 <- npqreg(bws=bw, tau=0.25)
model.q0.50 <- npqreg(bws=bw, tau=0.50)
model.q0.75 <- npqreg(bws=bw, tau=0.75)
# Plot the resulting quantiles manually...
plot(ordered(year), gdp,
main="CDF Quantile Estimates for the Italian Income Panel",
xlab="Year",
ylab="GDP Quantiles")
lines(ordered(year), model.q0.25$quantile, col="red", lty=2)
lines(ordered(year), model.q0.50$quantile, col="blue", lty=3)
lines(ordered(year), model.q0.75$quantile, col="red", lty=2)
legend(ordered(1951), 32, c("tau = 0.25", "tau = 0.50", "tau = 0.75"),
lty=c(2, 3, 2), col=c("red", "blue", "red"))
detach(Italy)
## End(Not run)