cv.boss {BOSSreg} | R Documentation |
Cross-validation for Best Orthogonalized Subset Selection (BOSS) and Forward Stepwise Selection (FS).
Description
Cross-validation for Best Orthogonalized Subset Selection (BOSS) and Forward Stepwise Selection (FS).
Usage
cv.boss(
x,
y,
maxstep = min(nrow(x) - intercept - 1, ncol(x)),
intercept = TRUE,
n.folds = 10,
n.rep = 1,
show.warning = TRUE,
...
)
Arguments
x |
A matrix of predictors, see |
y |
A vector of response variable, see |
maxstep |
Maximum number of steps performed. Default is |
intercept |
Logical, whether to fit an intercept term. Default is TRUE. |
n.folds |
The number of cross validation folds. Default is 10. |
n.rep |
The number of replications of cross validation. Default is 1. |
show.warning |
Whether to display a warning if CV is only performed for a subset of candidates. e.g. when n<p and 10-fold. Default is TRUE. |
... |
Arguments to |
Details
This function fits BOSS and FS (boss
) on the full dataset, and performs n.folds
cross-validation. The cross-validation process can be repeated n.rep
times to evaluate the
out-of-sample (OOS) performance for the candidate subsets given by both methods.
Value
boss: An object
boss
that fits on the full dataset.n.folds: The number of cross validation folds.
cvm.fs: Mean OOS deviance for each candidate given by FS.
cvm.boss: Mean OSS deviance for each candidate given by BOSS.
i.min.fs: The index of minimum cvm.fs.
i.min.boss: The index of minimum cvm.boss.
Author(s)
Sen Tian
References
Tian, S., Hurvich, C. and Simonoff, J. (2021), On the Use of Information Criteria for Subset Selection in Least Squares Regression. https://arxiv.org/abs/1911.10191
BOSSreg Vignette https://github.com/sentian/BOSSreg/blob/master/r-package/vignettes/BOSSreg.pdf
See Also
predict
and coef
methods for cv.boss
object, and the boss
function
Examples
## Generate a trivial dataset, X has mean 0 and norm 1, y has mean 0
set.seed(11)
n = 20
p = 5
x = matrix(rnorm(n*p), nrow=n, ncol=p)
x = scale(x, center = colMeans(x))
x = scale(x, scale = sqrt(colSums(x^2)))
beta = c(1, 1, 0, 0, 0)
y = x%*%beta + scale(rnorm(20, sd=0.01), center = TRUE, scale = FALSE)
## Perform 10-fold CV without replication
boss_cv_result = cv.boss(x, y)
## Get the coefficient vector of BOSS that gives minimum CV OSS score (S3 method for cv.boss)
beta_boss_cv = coef(boss_cv_result)
# the above is equivalent to
boss_result = boss_cv_result$boss
beta_boss_cv = boss_result$beta_boss[, boss_cv_result$i.min.boss, drop=FALSE]
## Get the fitted values of BOSS-CV (S3 method for cv.boss)
mu_boss_cv = predict(boss_cv_result, newx=x)
# the above is equivalent to
mu_boss_cv = cbind(1,x) %*% beta_boss_cv
## Get the coefficient vector of FS that gives minimum CV OSS score (S3 method for cv.boss)
beta_fs_cv = coef(boss_cv_result, method='fs')
## Get the fitted values of FS-CV (S3 method for cv.boss)
mu_fs_cv = predict(boss_cv_result, newx=x, method='fs')