R2oosse {oosse} | R Documentation |
Estimate out-of-sample R² and its standard error
Description
Estimate out-of-sample R² and its standard error
Usage
R2oosse(
y,
x,
fitFun,
predFun,
methodMSE = c("CV", "bootstrap"),
methodCor = c("nonparametric", "jackknife"),
printTimeEstimate = TRUE,
nFolds = 10L,
nInnerFolds = nFolds - 1L,
cvReps = 200L,
nBootstraps = 200L,
nBootstrapsCor = 50L,
...
)
Arguments
y |
The vector of outcome values |
x |
The matrix of predictors |
fitFun |
The function for fitting the prediction model |
predFun |
The function for evaluating the prediction model |
methodMSE |
The method to estimate the MSE, either "CV" for cross-validation or "bootstrap" for .632 bootstrap |
methodCor |
The method to estimate the correlation between MSE and MST estimators, either "nonparametric" or "jackknife" |
printTimeEstimate |
A boolean, should an estimate of the running time be printed? |
nFolds |
The number of outer folds for cross-validation |
nInnerFolds |
The number of inner cross-validation folds |
cvReps |
The number of repeats for the cross-validation |
nBootstraps |
The number of .632 bootstraps |
nBootstrapsCor |
The number of bootstraps to estimate the correlation |
... |
passed onto fitFun and predFun |
Details
Implements the calculation of the R² and its standard error by (Hawinkel et al. 2023). Multithreading is used as provided by the BiocParallel or doParallel packages, A rough estimate of expected computation time is printed when printTimeEstimate is true, but this is purely indicative. The options to estimate the mean squared error (MSE) are cross-validation (Bates et al. 2023) or the .632 bootstrap (Efron and Tibshirani 1997).
Value
A list with components
R2 |
Estimate of the R² with standard error |
MSE |
Estimate of the MSE with standard error |
MST |
Estimate of the MST with standard error |
corMSEMST |
Estimated correlation between MSE and MST estimators |
params |
List of parameters used |
fullModel |
The model trained on the entire dataset using fitFun |
n |
The sample size of the training data |
References
Bates S, Hastie T, Tibshirani R (2023).
“Cross-validation: What does it estimate and how well does it do it?”
J. Am. Stat. Assoc., 118(ja), 1 - 22.
doi:10.1080/01621459.2023.2197686, https://doi.org/10.1080/01621459.2023.2197686.
Efron B, Tibshirani R (1997).
“Improvements on cross-validation: The 632+ bootstrap method.”
J. Am. Stat. Assoc., 92(438), 548 - 560.
Hawinkel S, Waegeman W, Maere S (2023).
“Out-of-sample R2: Estimation and inference.”
Am. Stat., 1 - 16.
doi:10.1080/00031305.2023.2216252, https://doi.org/10.1080/00031305.2023.2216252.
See Also
Examples
data(Brassica)
#Linear model
fitFunLM = function(y, x){lm.fit(y = y, x = cbind(1, x))}
predFunLM = function(mod, x) {cbind(1,x) %*% mod$coef}
y = Brassica$Pheno$Leaf_8_width
R2lm = R2oosse(y = Brassica$Pheno$Leaf_8_width, x = Brassica$Expr[, 1:10],
fitFun = fitFunLM, predFun = predFunLM, nFolds = 10)