cvRepWtTuning {ClinicalUtilityRecal} | R Documentation |
Calibration weights require specification of tuning parameter delta or lambda. Since a single round of cross-validation can be noisy, cross-validation can be repeated multiple times with independent random partitions and the results be averaged. This function implements a repeated K-fold cross-validation where tuning parameter labmda or delta is selected by maximizing standardized net benefit (sNB) (i.e. repeated cvWtTuning
procedure).
A a "one-standard error" rule can be used for selecting tuning parameters. Under the “one-standard error" rule the calibration weight tuning parameter (lambda or delta) is selected such that corresponding cross-validated sNB is within one-standard deviation of the maximum cross-validated sNB. This provides protection against overfitting the data and selecting a tuning parameter that is too extreme. If the "one-standard error" rule is not implemented, then the tuning parameter with the larged average cross-validted sNB (across folds and repetition) will be selected.
cvRepWtTuning(y,p,r,rl,ru,kFold=5,cvRep=25,cvParm,tuneSeq,stdErrRule=TRUE,int.seed=11111)
y |
Vector of binary outcomes, with 1 indicating event (cases) and 0 indicating no event (controls) |
p |
Vector of risk score values |
r |
Clinically relevant risk threshold |
rl |
Lower bound of clinically relevant region |
ru |
Upper bound of clinically relevant region |
kFold |
Number of folds for cross-validation |
cvRep |
Number of cross-validation repititions |
cvParm |
Parameter to be selected via cross-validation. Can be either delta the weight assigned to observations outside the clinically relevant region [R_l,R_u], or the lambda tuning parameter controlling exponential decay within the clinically relevant region [R_l,R_u] |
tuneSeq |
Sequence of values of tuning parameters to perform cross-validation over |
stdErrRule |
Use "one-standard" error rule selecting tuning parameter |
int.seed |
Intial seed set for random splitting of data into K folds |
To estimate the standard deviation of the cross-validated sNV, the dependence between the different partitions of cross-validation needs to be accounted for. Gelman (1992) give a variance estimator of convergence diagnostic statistic used when Markov Chain Monte Carlo with multiple chains are performed. The variance estimator accounts for both the variability of the statistic “within" a single chain, and the variance of the statistic across, or “between", chains. Analogously, we can use this framework to estimate the “within" repetition variance (i.e. variation in sNB from a single round of K-fold cross-validation) and the “between" repetition variance. We denote the ‘within" repetition variance as W and the “between" repetition variance as B . We augment this formula slightly from that given in Gelman (1992) to account for the fact that as the number of cross-validation repetitions increases, the between-repetition variability should decrease. See Mishra et al (2020) for full expressions of B and W.
cv.sNB |
Standardized net benefit (sNB) of tuning parameter selected via cross-validatoin |
cv.RAW |
Corresponding RAW value given cross-valiated selected tuning parameter |
cv.lambda |
lambda value selected via cross-validation if cvParm=lambda, otherwise user specified lambda value |
cv.delta |
delta value selected via cross-validation if cvParm=delta, otherwise user specified lambda value |
avgCV.res |
Averaged (across-replications) cross-validated sNB for sequence of tuning parameters |
W |
Estimate of "with-in" repetition variance. Will only return if stdErrRule==TRUE |
B |
Estimate of "between" repetition variance. Will only return if stdErrRule==TRUE |
fullList |
List of cross-valiation results for all fold and repititions |
Anu Mishra
Mishra, A. (2019). Methods for Risk Markers that Incorporate Clinical Utility (Doctoral dissertation). (Available Upon Request)
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457-472.
calWt
,
RAWgrid
,
nb
,
cvWtTuning
### Load data ## ## Not run: data(fakeData) ### Get grid of tuning parameters ### grid <- RAWgrid(r = 0.3,rl = -Inf,ru = Inf,p = fakeData$p,y = fakeData$y, cvParm = "lambda",rl.raw = 0.25,ru.raw = 0.35) ### Implement repeated k-fold cross validation repCV <- cvRepWtTuning(y = fakeData$y,p = fakeData$p,rl = -Inf,ru = Inf,r = 0.3, kFold = 5,cvRep = 25,cvParm = "lambda",tuneSeq = grid,stdErrRule = TRUE) ## cross-validation results repCV$avgCV.res ## cross-validation selected lambda, RAW, and sNV cv.lambda <- repCV$cv.lambda cv.RAW <- repCV$cv.RAW cv.RAW <- repCV$cv.sNB ## End(Not run)