R2oosse {oosse}R Documentation

Estimate out-of-sample R² and its standard error

Description

Estimate out-of-sample R² and its standard error

Usage

R2oosse(
  y,
  x,
  fitFun,
  predFun,
  methodMSE = c("CV", "bootstrap"),
  methodCor = c("nonparametric", "jackknife"),
  printTimeEstimate = TRUE,
  nFolds = 10L,
  nInnerFolds = nFolds - 1L,
  cvReps = 200L,
  nBootstraps = 200L,
  nBootstrapsCor = 50L,
  ...
)

Arguments

y

The vector of outcome values

x

The matrix of predictors

fitFun

The function for fitting the prediction model

predFun

The function for evaluating the prediction model

methodMSE

The method to estimate the MSE, either "CV" for cross-validation or "bootstrap" for .632 bootstrap

methodCor

The method to estimate the correlation between MSE and MST estimators, either "nonparametric" or "jackknife"

printTimeEstimate

A boolean, should an estimate of the running time be printed?

nFolds

The number of outer folds for cross-validation

nInnerFolds

The number of inner cross-validation folds

cvReps

The number of repeats for the cross-validation

nBootstraps

The number of .632 bootstraps

nBootstrapsCor

The number of bootstraps to estimate the correlation

...

passed onto fitFun and predFun

Details

Implements the calculation of the R² and its standard error by (Hawinkel et al. 2023). Multithreading is used as provided by the BiocParallel or doParallel packages, A rough estimate of expected computation time is printed when printTimeEstimate is true, but this is purely indicative. The options to estimate the mean squared error (MSE) are cross-validation (Bates et al. 2023) or the .632 bootstrap (Efron and Tibshirani 1997).

Value

A list with components

R2

Estimate of the R² with standard error

MSE

Estimate of the MSE with standard error

MST

Estimate of the MST with standard error

corMSEMST

Estimated correlation between MSE and MST estimators

params

List of parameters used

fullModel

The model trained on the entire dataset using fitFun

n

The sample size of the training data

References

Bates S, Hastie T, Tibshirani R (2023). “Cross-validation: What does it estimate and how well does it do it?” J. Am. Stat. Assoc., 118(ja), 1 - 22. doi:10.1080/01621459.2023.2197686, https://doi.org/10.1080/01621459.2023.2197686.

Efron B, Tibshirani R (1997). “Improvements on cross-validation: The 632+ bootstrap method.” J. Am. Stat. Assoc., 92(438), 548 - 560.

Hawinkel S, Waegeman W, Maere S (2023). “Out-of-sample R2: Estimation and inference.” Am. Stat., 1 - 16. doi:10.1080/00031305.2023.2216252, https://doi.org/10.1080/00031305.2023.2216252.

See Also

buildConfInt

Examples

data(Brassica)
#Linear model
fitFunLM = function(y, x){lm.fit(y = y, x = cbind(1, x))}
predFunLM = function(mod, x) {cbind(1,x) %*% mod$coef}
y = Brassica$Pheno$Leaf_8_width
R2lm = R2oosse(y = Brassica$Pheno$Leaf_8_width, x = Brassica$Expr[, 1:10],
fitFun = fitFunLM, predFun = predFunLM, nFolds = 10)

[Package oosse version 1.0.11 Index]