cvRepWtTuning {ClinicalUtilityRecal} | R Documentation |
Repeated Cross Validation for Weight Tuning Parameter Selection
Description
Calibration weights require specification of tuning parameter delta
or lambda
. Since a single round of cross-validation can be noisy, cross-validation can be repeated multiple times with independent random partitions and the results be averaged. This function implements a repeated K-fold cross-validation where tuning parameter labmda
or delta
is selected by maximizing standardized net benefit (sNB) (i.e. repeated cvWtTuning
procedure).
A a "one-standard error" rule can be used for selecting tuning parameters. Under the “one-standard error" rule the calibration weight tuning parameter (lambda
or delta
) is selected such that corresponding cross-validated sNB is within one-standard deviation of the maximum cross-validated sNB. This provides protection against overfitting the data and selecting a tuning parameter that is too extreme. If the "one-standard error" rule is not implemented, then the tuning parameter with the larged average cross-validted sNB (across folds and repetition) will be selected.
Usage
cvRepWtTuning(y,p,r,rl,ru,kFold=5,cvRep=25,cvParm,tuneSeq,stdErrRule=TRUE,int.seed=11111)
Arguments
y |
Vector of binary outcomes, with 1 indicating event (cases) and 0 indicating no event (controls) |
p |
Vector of risk score values |
r |
Clinically relevant risk threshold |
rl |
Lower bound of clinically relevant region |
ru |
Upper bound of clinically relevant region |
kFold |
Number of folds for cross-validation |
cvRep |
Number of cross-validation repititions |
cvParm |
Parameter to be selected via cross-validation. Can be either |
tuneSeq |
Sequence of values of tuning parameters to perform cross-validation over |
stdErrRule |
Use "one-standard" error rule selecting tuning parameter |
int.seed |
Intial seed set for random splitting of data into K folds |
Details
To estimate the standard deviation of the cross-validated sNV, the dependence between the different partitions of cross-validation needs to be accounted for. Gelman (1992) give a variance estimator of convergence diagnostic statistic used when Markov Chain Monte Carlo with multiple chains are performed. The variance estimator accounts for both the variability of the statistic “within" a single chain, and the variance of the statistic across, or “between", chains. Analogously, we can use this framework to estimate the “within" repetition variance (i.e. variation in sNB from a single round of K-fold cross-validation) and the “between" repetition variance. We denote the ‘within" repetition variance as W and the “between" repetition variance as B . We augment this formula slightly from that given in Gelman (1992) to account for the fact that as the number of cross-validation repetitions increases, the between-repetition variability should decrease. See Mishra et al (2020) for full expressions of B and W.
Value
cv.sNB |
Standardized net benefit (sNB) of tuning parameter selected via cross-validatoin |
cv.RAW |
Corresponding RAW value given cross-valiated selected tuning parameter |
cv.lambda |
|
cv.delta |
|
avgCV.res |
Averaged (across-replications) cross-validated sNB for sequence of tuning parameters |
W |
Estimate of "with-in" repetition variance. Will only return if stdErrRule==TRUE |
B |
Estimate of "between" repetition variance. Will only return if stdErrRule==TRUE |
fullList |
List of cross-valiation results for all fold and repititions |
Author(s)
Anu Mishra
References
Mishra, A. (2019). Methods for Risk Markers that Incorporate Clinical Utility (Doctoral dissertation). (Available Upon Request)
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457-472.
See Also
calWt
,
RAWgrid
,
nb
,
cvWtTuning
Examples
### Load data ##
## Not run:
data(fakeData)
### Get grid of tuning parameters ###
grid <- RAWgrid(r = 0.3,rl = -Inf,ru = Inf,p = fakeData$p,y = fakeData$y,
cvParm = "lambda",rl.raw = 0.25,ru.raw = 0.35)
### Implement repeated k-fold cross validation
repCV <- cvRepWtTuning(y = fakeData$y,p = fakeData$p,rl = -Inf,ru = Inf,r = 0.3,
kFold = 5,cvRep = 25,cvParm = "lambda",tuneSeq = grid,stdErrRule = TRUE)
## cross-validation results
repCV$avgCV.res
## cross-validation selected lambda, RAW, and sNV
cv.lambda <- repCV$cv.lambda
cv.RAW <- repCV$cv.RAW
cv.RAW <- repCV$cv.sNB
## End(Not run)