krscvNOMAD {crs} | R Documentation |
Categorical Kernel Regression Spline Cross-Validation
Description
krscvNOMAD
computes NOMAD-based (Nonsmooth Optimization by Mesh
Adaptive Direct Search, Abramson, Audet, Couture and Le Digabel
(2011)) cross-validation directed search for a regression spline
estimate of a one (1) dimensional dependent variable on an
r
-dimensional vector of continuous and nominal/ordinal
(factor
/ordered
) predictors.
Usage
krscvNOMAD(xz,
y,
degree.max = 10,
segments.max = 10,
degree.min = 0,
segments.min = 1,
cv.df.min = 1,
complexity = c("degree-knots","degree","knots"),
knots = c("quantiles","uniform","auto"),
basis = c("additive","tensor","glp","auto"),
cv.func = c("cv.ls","cv.gcv","cv.aic"),
degree = degree,
segments = segments,
lambda = lambda,
lambda.discrete = FALSE,
lambda.discrete.num = 100,
random.seed = 42,
max.bb.eval = 10000,
initial.mesh.size.real = "r0.1",
initial.mesh.size.integer = "1",
min.mesh.size.real = paste("r",sqrt(.Machine$double.eps),sep=""),
min.mesh.size.integer = "1",
min.poll.size.real = "1",
min.poll.size.integer = "1",
opts=list(),
nmulti = 0,
tau = NULL,
weights = NULL,
singular.ok = FALSE)
Arguments
y |
continuous univariate vector |
xz |
continuous and/or nominal/ordinal
( |
degree.max |
the maximum degree of the B-spline basis for
each of the continuous predictors (default |
segments.max |
the maximum segments of the B-spline basis for
each of the continuous predictors (default |
degree.min |
the minimum degree of the B-spline basis for
each of the continuous predictors (default |
segments.min |
the minimum segments of the B-spline basis for
each of the continuous predictors (default |
cv.df.min |
the minimum degrees of freedom to allow when
conducting cross-validation (default |
complexity |
a character string (default
|
knots |
a character string (default |
basis |
a character string (default |
cv.func |
a character string (default |
degree |
integer/vector specifying the degree of the B-spline
basis for each dimension of the continuous |
segments |
integer/vector specifying the number of segments of
the B-spline basis for each dimension of the continuous |
lambda |
real/vector for the categorical predictors. If it is not NULL, it will be the starting value(s) for lambda |
lambda.discrete |
if |
lambda.discrete.num |
a positive integer indicating the number of
discrete values that lambda can assume - this parameter will only be
used when |
random.seed |
when it is not missing and not equal to 0, the initial points will
be generated using this seed when |
max.bb.eval |
argument passed to the NOMAD solver (see |
initial.mesh.size.real |
argument passed to the NOMAD solver (see |
initial.mesh.size.integer |
argument passed to the NOMAD solver (see |
min.mesh.size.real |
argument passed to the NOMAD solver (see |
min.mesh.size.integer |
arguments passed to the NOMAD solver (see |
min.poll.size.real |
arguments passed to the NOMAD solver (see |
min.poll.size.integer |
arguments passed to the NOMAD solver (see |
opts |
list of optional arguments to be passed to
|
nmulti |
integer number of times to restart the process of finding extrema of
the cross-validation function from different (random) initial
points (default |
tau |
if non-null a number in (0,1) denoting the quantile for which a quantile
regression spline is to be estimated rather than estimating the
conditional mean (default |
weights |
an optional vector of weights to be used in the fitting process. Should be ‘NULL’ or a numeric vector. If non-NULL, weighted least squares is used with weights ‘weights’ (that is, minimizing ‘sum(w*e^2)’); otherwise ordinary least squares is used. |
singular.ok |
a logical value (default |
Details
krscvNOMAD
computes NOMAD-based cross-validation for a
regression spline estimate of a one (1) dimensional dependent variable
on an r
-dimensional vector of continuous and nominal/ordinal
(factor
/ordered
) predictors. Numerical
search for the optimal degree
/segments
/lambda
is
undertaken using snomadr
.
The optimal K
/lambda
combination is returned along with
other results (see below for return values). The method uses kernel
functions appropriate for categorical (ordinal/nominal) predictors
which avoids the loss in efficiency associated with sample-splitting
procedures that are typically used when faced with a mix of continuous
and nominal/ordinal (factor
/ordered
)
predictors.
For the continuous predictors the regression spline model employs
either the additive or tensor product B-spline basis matrix for a
multivariate polynomial spline via the B-spline routines in the GNU
Scientific Library (https://www.gnu.org/software/gsl/) and the
tensor.prod.model.matrix
function.
For the discrete predictors the product kernel function is of the ‘Li-Racine’ type (see Li and Racine (2007) for details).
Value
krscvNOMAD
returns a crscv
object. Furthermore, the
function summary
supports objects of this type. The
returned objects have the following components:
K |
scalar/vector containing optimal degree(s) of spline or number of segments |
K.mat |
vector/matrix of values of |
degree.max |
the maximum degree of the B-spline basis for
each of the continuous predictors (default |
segments.max |
the maximum segments of the B-spline basis for
each of the continuous predictors (default |
degree.min |
the minimum degree of the B-spline basis for
each of the continuous predictors (default |
segments.min |
the minimum segments of the B-spline basis for
each of the continuous predictors (default |
restarts |
number of restarts during search, if any |
lambda |
optimal bandwidths for categorical predictors |
lambda.mat |
vector/matrix of optimal bandwidths for each degree of spline |
cv.func |
objective function value at optimum |
cv.func.vec |
vector of objective function values at each degree
of spline or number of segments in |
Author(s)
Jeffrey S. Racine racinej@mcmaster.ca and Zhenghua Nie niez@mcmaster.ca
References
Abramson, M.A. and C. Audet and G. Couture and J.E. Dennis Jr. and S. Le Digabel (2011), “The NOMAD project”. Software available at https://www.gerad.ca/nomad.
Craven, P. and G. Wahba (1979), “Smoothing Noisy Data With Spline Functions,” Numerische Mathematik, 13, 377-403.
Hurvich, C.M. and J.S. Simonoff and C.L. Tsai (1998), “Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion,” Journal of the Royal Statistical Society B, 60, 271-293.
Le Digabel, S. (2011), “Algorithm 909: NOMAD: Nonlinear Optimization With The MADS Algorithm”. ACM Transactions on Mathematical Software, 37(4):44:1-44:15.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Ma, S. and J.S. Racine and L. Yang (2015), “Spline Regression in the Presence of Categorical Predictors,” Journal of Applied Econometrics, Volume 30, 705-717.
Ma, S. and J.S. Racine (2013), “Additive Regression Splines with Irrelevant Categorical and Continuous Regressors,” Statistica Sinica, Volume 23, 515-541.
See Also
Examples
set.seed(42)
## Simulated data
n <- 1000
x <- runif(n)
z <- round(runif(n,min=-0.5,max=1.5))
z.unique <- uniquecombs(as.matrix(z))
ind <- attr(z.unique,"index")
ind.vals <- sort(unique(ind))
dgp <- numeric(length=n)
for(i in 1:nrow(z.unique)) {
zz <- ind == ind.vals[i]
dgp[zz] <- z[zz]+cos(2*pi*x[zz])
}
y <- dgp + rnorm(n,sd=.1)
xdata <- data.frame(x,z=factor(z))
## Compute the optimal K and lambda, determine optimal number of knots, set
## spline degree for x to 3
cv <- krscvNOMAD(x=xdata,y=y,complexity="knots",degree=c(3),segments=c(5))
summary(cv)