R: Estimate out-of-sample R² and its standard error

R2oosse {oosse}

R Documentation

Estimate out-of-sample R² and its standard error

Description

Estimate out-of-sample R² and its standard error

Usage

R2oosse(
  y,
  x,
  fitFun,
  predFun,
  methodMSE = c("CV", "bootstrap"),
  methodCor = c("nonparametric", "jackknife"),
  printTimeEstimate = TRUE,
  nFolds = 10L,
  nInnerFolds = nFolds - 1L,
  cvReps = 200L,
  nBootstraps = 200L,
  nBootstrapsCor = 50L,
  ...
)

Arguments

`y`	The vector of outcome values
`x`	The matrix of predictors
`fitFun`	The function for fitting the prediction model
`predFun`	The function for evaluating the prediction model
`methodMSE`	The method to estimate the MSE, either "CV" for cross-validation or "bootstrap" for .632 bootstrap
`methodCor`	The method to estimate the correlation between MSE and MST estimators, either "nonparametric" or "jackknife"
`printTimeEstimate`	A boolean, should an estimate of the running time be printed?
`nFolds`	The number of outer folds for cross-validation
`nInnerFolds`	The number of inner cross-validation folds
`cvReps`	The number of repeats for the cross-validation
`nBootstraps`	The number of .632 bootstraps
`nBootstrapsCor`	The number of bootstraps to estimate the correlation
`...`	passed onto fitFun and predFun

Details

Implements the calculation of the R² and its standard error by (Hawinkel et al. 2023). Multithreading is used as provided by the BiocParallel or doParallel packages, A rough estimate of expected computation time is printed when printTimeEstimate is true, but this is purely indicative. The options to estimate the mean squared error (MSE) are cross-validation (Bates et al. 2023) or the .632 bootstrap (Efron and Tibshirani 1997).

Value

A list with components

`R2`	Estimate of the R² with standard error
`MSE`	Estimate of the MSE with standard error
`MST`	Estimate of the MST with standard error
`corMSEMST`	Estimated correlation between MSE and MST estimators
`params`	List of parameters used
`fullModel`	The model trained on the entire dataset using fitFun
`n`	The sample size of the training data

References

Bates S, Hastie T, Tibshirani R (2023). “Cross-validation: What does it estimate and how well does it do it?” J. Am. Stat. Assoc., 118(ja), 1 - 22. doi:10.1080/01621459.2023.2197686, https://doi.org/10.1080/01621459.2023.2197686.

Efron B, Tibshirani R (1997). “Improvements on cross-validation: The 632+ bootstrap method.” J. Am. Stat. Assoc., 92(438), 548 - 560.

Hawinkel S, Waegeman W, Maere S (2023). “Out-of-sample R2: Estimation and inference.” Am. Stat., 1 - 16. doi:10.1080/00031305.2023.2216252, https://doi.org/10.1080/00031305.2023.2216252.

Examples

data(Brassica)
#Linear model
fitFunLM = function(y, x){lm.fit(y = y, x = cbind(1, x))}
predFunLM = function(mod, x) {cbind(1,x) %*% mod$coef}
y = Brassica$Pheno$Leaf_8_width
R2lm = R2oosse(y = Brassica$Pheno$Leaf_8_width, x = Brassica$Expr[, 1:10],
fitFun = fitFunLM, predFun = predFunLM, nFolds = 10)

[Package oosse version 1.0.11 Index]

Estimate out-of-sample R² and its standard error

Description

Usage

Arguments

Details

Value

References

See Also

Examples