cvRepWtTuning {ClinicalUtilityRecal}R Documentation

Repeated Cross Validation for Weight Tuning Parameter Selection

Description

Calibration weights require specification of tuning parameter delta or lambda. Since a single round of cross-validation can be noisy, cross-validation can be repeated multiple times with independent random partitions and the results be averaged. This function implements a repeated K-fold cross-validation where tuning parameter labmda or delta is selected by maximizing standardized net benefit (sNB) (i.e. repeated cvWtTuning procedure).

A a "one-standard error" rule can be used for selecting tuning parameters. Under the “one-standard error" rule the calibration weight tuning parameter (lambda or delta) is selected such that corresponding cross-validated sNB is within one-standard deviation of the maximum cross-validated sNB. This provides protection against overfitting the data and selecting a tuning parameter that is too extreme. If the "one-standard error" rule is not implemented, then the tuning parameter with the larged average cross-validted sNB (across folds and repetition) will be selected.

Usage

cvRepWtTuning(y,p,r,rl,ru,kFold=5,cvRep=25,cvParm,tuneSeq,stdErrRule=TRUE,int.seed=11111)

Arguments

y

Vector of binary outcomes, with 1 indicating event (cases) and 0 indicating no event (controls)

p

Vector of risk score values

r

Clinically relevant risk threshold

rl

Lower bound of clinically relevant region

ru

Upper bound of clinically relevant region

kFold

Number of folds for cross-validation

cvRep

Number of cross-validation repititions

cvParm

Parameter to be selected via cross-validation. Can be either delta the weight assigned to observations outside the clinically relevant region [R_l,R_u], or the lambda tuning parameter controlling exponential decay within the clinically relevant region [R_l,R_u]

tuneSeq

Sequence of values of tuning parameters to perform cross-validation over

stdErrRule

Use "one-standard" error rule selecting tuning parameter

int.seed

Intial seed set for random splitting of data into K folds

Details

To estimate the standard deviation of the cross-validated sNV, the dependence between the different partitions of cross-validation needs to be accounted for. Gelman (1992) give a variance estimator of convergence diagnostic statistic used when Markov Chain Monte Carlo with multiple chains are performed. The variance estimator accounts for both the variability of the statistic “within" a single chain, and the variance of the statistic across, or “between", chains. Analogously, we can use this framework to estimate the “within" repetition variance (i.e. variation in sNB from a single round of K-fold cross-validation) and the “between" repetition variance. We denote the ‘within" repetition variance as W and the “between" repetition variance as B . We augment this formula slightly from that given in Gelman (1992) to account for the fact that as the number of cross-validation repetitions increases, the between-repetition variability should decrease. See Mishra et al (2020) for full expressions of B and W.

Value

cv.sNB

Standardized net benefit (sNB) of tuning parameter selected via cross-validatoin

cv.RAW

Corresponding RAW value given cross-valiated selected tuning parameter

cv.lambda

lambda value selected via cross-validation if cvParm=lambda, otherwise user specified lambda value

cv.delta

delta value selected via cross-validation if cvParm=delta, otherwise user specified lambda value

avgCV.res

Averaged (across-replications) cross-validated sNB for sequence of tuning parameters

W

Estimate of "with-in" repetition variance. Will only return if stdErrRule==TRUE

B

Estimate of "between" repetition variance. Will only return if stdErrRule==TRUE

fullList

List of cross-valiation results for all fold and repititions

Author(s)

Anu Mishra

References

Mishra, A. (2019). Methods for Risk Markers that Incorporate Clinical Utility (Doctoral dissertation). (Available Upon Request)

Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457-472.

See Also

calWt, RAWgrid, nb, cvWtTuning

Examples

### Load data ##
## Not run: 
data(fakeData)

### Get grid of tuning parameters  ###
grid <- RAWgrid(r = 0.3,rl = -Inf,ru = Inf,p = fakeData$p,y = fakeData$y,
                cvParm = "lambda",rl.raw = 0.25,ru.raw = 0.35)

### Implement repeated k-fold cross validation
repCV <- cvRepWtTuning(y = fakeData$y,p = fakeData$p,rl = -Inf,ru = Inf,r = 0.3,
                       kFold = 5,cvRep = 25,cvParm = "lambda",tuneSeq = grid,stdErrRule = TRUE)

## cross-validation results
repCV$avgCV.res

## cross-validation selected lambda, RAW, and sNV
cv.lambda <- repCV$cv.lambda
cv.RAW <- repCV$cv.RAW
cv.RAW <- repCV$cv.sNB

## End(Not run)

[Package ClinicalUtilityRecal version 0.1.0 Index]