krscv {crs} | R Documentation |
Categorical Kernel Regression Spline Cross-Validation
Description
krscv
computes exhaustive cross-validation directed search for
a regression spline estimate of a one (1) dimensional dependent
variable on an r
-dimensional vector of continuous and
nominal/ordinal (factor
/ordered
)
predictors.
Usage
krscv(xz,
y,
degree.max = 10,
segments.max = 10,
degree.min = 0,
segments.min = 1,
restarts = 0,
complexity = c("degree-knots","degree","knots"),
knots = c("quantiles","uniform","auto"),
basis = c("additive","tensor","glp","auto"),
cv.func = c("cv.ls","cv.gcv","cv.aic"),
degree = degree,
segments = segments,
tau = NULL,
weights = NULL,
singular.ok = FALSE)
Arguments
y |
continuous univariate vector |
xz |
continuous and/or nominal/ordinal
( |
degree.max |
the maximum degree of the B-spline basis for
each of the continuous predictors (default |
segments.max |
the maximum segments of the B-spline basis for
each of the continuous predictors (default |
degree.min |
the minimum degree of the B-spline basis for
each of the continuous predictors (default |
segments.min |
the minimum segments of the B-spline basis for
each of the continuous predictors (default |
restarts |
number of times to restart |
complexity |
a character string (default
|
knots |
a character string (default |
basis |
a character string (default |
cv.func |
a character string (default |
degree |
integer/vector specifying the degree of the B-spline
basis for each dimension of the continuous |
segments |
integer/vector specifying the number of segments of
the B-spline basis for each dimension of the continuous |
tau |
if non-null a number in (0,1) denoting the quantile for which a quantile
regression spline is to be estimated rather than estimating the
conditional mean (default |
weights |
an optional vector of weights to be used in the fitting process. Should be ‘NULL’ or a numeric vector. If non-NULL, weighted least squares is used with weights ‘weights’ (that is, minimizing ‘sum(w*e^2)’); otherwise ordinary least squares is used. |
singular.ok |
a logical value (default |
Details
krscv
computes exhaustive cross-validation for a regression
spline estimate of a one (1) dimensional dependent variable on an
r
-dimensional vector of continuous and nominal/ordinal
(factor
/ordered
) predictors. The optimal
K
/lambda
combination is returned along with other
results (see below for return values). The method uses kernel
functions appropriate for categorical (ordinal/nominal) predictors
which avoids the loss in efficiency associated with sample-splitting
procedures that are typically used when faced with a mix of continuous
and nominal/ordinal (factor
/ordered
)
predictors.
For the continuous predictors the regression spline model employs
either the additive or tensor product B-spline basis matrix for a
multivariate polynomial spline via the B-spline routines in the GNU
Scientific Library (https://www.gnu.org/software/gsl/) and the
tensor.prod.model.matrix
function.
For the discrete predictors the product kernel function is of the ‘Li-Racine’ type (see Li and Racine (2007) for details).
For each unique combination of degree
and segment
,
numerical search for the bandwidth vector lambda
is undertaken
using optim
and the box-constrained L-BFGS-B
method (see optim
for details). The user may restart the
optim
algorithm as many times as desired via the
restarts
argument. The approach ascends from K=0
through
degree.max
/segments.max
and for each value of K
searches for the optimal bandwidths for this value of K
. After
the most complex model has been searched then the optimal
K
/lambda
combination is selected. If any element of the
optimal K
vector coincides with
degree.max
/segments.max
a warning is produced and the
user ought to restart their search with a larger value of
degree.max
/segments.max
.
Value
krscv
returns a crscv
object. Furthermore, the
function summary
supports objects of this type. The
returned objects have the following components:
K |
scalar/vector containing optimal degree(s) of spline or number of segments |
K.mat |
vector/matrix of values of |
restarts |
number of restarts during search, if any |
lambda |
optimal bandwidths for categorical predictors |
lambda.mat |
vector/matrix of optimal bandwidths for each degree of spline |
cv.func |
objective function value at optimum |
cv.func.vec |
vector of objective function values at each degree
of spline or number of segments in |
Author(s)
Jeffrey S. Racine racinej@mcmaster.ca
References
Craven, P. and G. Wahba (1979), “Smoothing Noisy Data With Spline Functions,” Numerische Mathematik, 13, 377-403.
Hurvich, C.M. and J.S. Simonoff and C.L. Tsai (1998), “Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion,” Journal of the Royal Statistical Society B, 60, 271-293.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Ma, S. and J.S. Racine and L. Yang (2015), “Spline Regression in the Presence of Categorical Predictors,” Journal of Applied Econometrics, Volume 30, 705-717.
Ma, S. and J.S. Racine (2013), “Additive Regression Splines with Irrelevant Categorical and Continuous Regressors,” Statistica Sinica, Volume 23, 515-541.
See Also
Examples
set.seed(42)
## Simulated data
n <- 1000
x <- runif(n)
z <- round(runif(n,min=-0.5,max=1.5))
z.unique <- uniquecombs(as.matrix(z))
ind <- attr(z.unique,"index")
ind.vals <- sort(unique(ind))
dgp <- numeric(length=n)
for(i in 1:nrow(z.unique)) {
zz <- ind == ind.vals[i]
dgp[zz] <- z[zz]+cos(2*pi*x[zz])
}
y <- dgp + rnorm(n,sd=.1)
xdata <- data.frame(x,z=factor(z))
## Compute the optimal K and lambda, determine optimal number of knots, set
## spline degree for x to 3
cv <- krscv(x=xdata,y=y,complexity="knots",degree=c(3))
summary(cv)