cv_resmooth {drape} | R Documentation |
K-fold cross-validation for resmoothing bandwidth.
Description
Picks the largest resmoothing bandwidth achieving a cross-validation score within some specified tolerance of the original regression.
Usage
cv_resmooth(
X,
y,
d = 1,
regression,
tol = 2,
prefit = FALSE,
foldid = NULL,
bw = exp(seq(-5, 2, 0.2))/(2 * sqrt(3)) * stats::sd(X[, d]),
nfolds = 5L,
n_points = 101,
sd_trim = 5
)
Arguments
X |
matrix of covariates. |
y |
vector of responses. |
d |
integer index of covariate to be smoothed along. |
regression |
If prefit = FALSE this is a function which takes input data of the form (X,y), and returns a prediction function. This prediction function itself accepts matrix input same width as X, and returns a vector of y-predictions, and optionally a vector of derivative predictions. If prefit = TRUE then this is a list of length nfolds with each entry containing a component "fit" consisting of a prediction function taking matrix input and returning a vector. |
tol |
vector of tolerances controlling the degree of permissible cross-validation error increase. Larger values lead to a larger amount of smoothing being selected. |
prefit |
boolean signifying if the regressions are already fit to the training data for each fold. |
foldid |
optional vector with components in 1:nfolds indicating the folds in which each observation fell. Overwrites nfolds. |
bw |
vector of bandwidths for the Gaussian resmoothing kernel. |
nfolds |
integer number of cross-validation folds. |
n_points |
integer number of gridpoints to be used for convolution. |
sd_trim |
float number of standard deviations at which to trim the Gaussian distribution. |
Value
list. Vector "bw" of bandwidths used. Vectors "cv" of cross-validation scores and numeric "cv_unsm" for the cross-validation without any smoothing. Vector "bw_opt_inds" for the indices of the selected bandwidths under various tolerances. Vector "bw_opt" for the corresponding bandwidths.
Examples
X <- matrix(stats::rnorm(200), ncol=2)
y <- X[,1] + sin(X[,2]) + 0.5 * stats::rnorm(nrow(X))
reg <- function(X,y){
df <- data.frame(y,X)
colnames(df) <- c("y", "X1", "X2")
lm1 <- stats::lm(y~X1+sin(X2), data=df)
fit <- function(newX){
newdf = data.frame(newX)
colnames(newdf) <- c("X1", "X2")
return(as.vector(stats::predict(lm1, newdata=newdf)))}
return(list("fit"=fit))
}
cv_resmooth(X=X, y=y, d=2, regression=reg, tol = c(0.5, 1, 2))