R: Repeated Cross Validation for Weight Tuning Parameter...

cvRepWtTuning {ClinicalUtilityRecal}

R Documentation

Repeated Cross Validation for Weight Tuning Parameter Selection

Description

Calibration weights require specification of tuning parameter delta or lambda. Since a single round of cross-validation can be noisy, cross-validation can be repeated multiple times with independent random partitions and the results be averaged. This function implements a repeated K-fold cross-validation where tuning parameter labmda or delta is selected by maximizing standardized net benefit (sNB) (i.e. repeated cvWtTuning procedure).

A a "one-standard error" rule can be used for selecting tuning parameters. Under the “one-standard error" rule the calibration weight tuning parameter (lambda or delta) is selected such that corresponding cross-validated sNB is within one-standard deviation of the maximum cross-validated sNB. This provides protection against overfitting the data and selecting a tuning parameter that is too extreme. If the "one-standard error" rule is not implemented, then the tuning parameter with the larged average cross-validted sNB (across folds and repetition) will be selected.

Usage

cvRepWtTuning(y,p,r,rl,ru,kFold=5,cvRep=25,cvParm,tuneSeq,stdErrRule=TRUE,int.seed=11111)

Arguments

`y`	Vector of binary outcomes, with 1 indicating event (cases) and 0 indicating no event (controls)
`p`	Vector of risk score values
`r`	Clinically relevant risk threshold
`rl`	Lower bound of clinically relevant region
`ru`	Upper bound of clinically relevant region
`kFold`	Number of folds for cross-validation
`cvRep`	Number of cross-validation repititions
`cvParm`	Parameter to be selected via cross-validation. Can be either `delta` the weight assigned to observations outside the clinically relevant region [R_l,R_u], or the `lambda` tuning parameter controlling exponential decay within the clinically relevant region [R_l,R_u]
`tuneSeq`	Sequence of values of tuning parameters to perform cross-validation over
`stdErrRule`	Use "one-standard" error rule selecting tuning parameter
`int.seed`	Intial seed set for random splitting of data into K folds

Details

To estimate the standard deviation of the cross-validated sNV, the dependence between the different partitions of cross-validation needs to be accounted for. Gelman (1992) give a variance estimator of convergence diagnostic statistic used when Markov Chain Monte Carlo with multiple chains are performed. The variance estimator accounts for both the variability of the statistic “within" a single chain, and the variance of the statistic across, or “between", chains. Analogously, we can use this framework to estimate the “within" repetition variance (i.e. variation in sNB from a single round of K-fold cross-validation) and the “between" repetition variance. We denote the ‘within" repetition variance as W and the “between" repetition variance as B . We augment this formula slightly from that given in Gelman (1992) to account for the fact that as the number of cross-validation repetitions increases, the between-repetition variability should decrease. See Mishra et al (2020) for full expressions of B and W.

Value

`cv.sNB`	Standardized net benefit (sNB) of tuning parameter selected via cross-validatoin
`cv.RAW`	Corresponding RAW value given cross-valiated selected tuning parameter
`cv.lambda`	`lambda` value selected via cross-validation if `cvParm=lambda`, otherwise user specified `lambda` value
`cv.delta`	`delta` value selected via cross-validation if `cvParm=delta`, otherwise user specified `lambda` value
`avgCV.res`	Averaged (across-replications) cross-validated sNB for sequence of tuning parameters
`W`	Estimate of "with-in" repetition variance. Will only return if stdErrRule==TRUE
`B`	Estimate of "between" repetition variance. Will only return if stdErrRule==TRUE
`fullList`	List of cross-valiation results for all fold and repititions

Author(s)

Anu Mishra

References

Mishra, A. (2019). Methods for Risk Markers that Incorporate Clinical Utility (Doctoral dissertation). (Available Upon Request)

Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York: Springer series in statistics.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457-472.

Examples

### Load data ##
## Not run: 
data(fakeData)

### Get grid of tuning parameters  ###
grid <- RAWgrid(r = 0.3,rl = -Inf,ru = Inf,p = fakeData$p,y = fakeData$y,
                cvParm = "lambda",rl.raw = 0.25,ru.raw = 0.35)

### Implement repeated k-fold cross validation
repCV <- cvRepWtTuning(y = fakeData$y,p = fakeData$p,rl = -Inf,ru = Inf,r = 0.3,
                       kFold = 5,cvRep = 25,cvParm = "lambda",tuneSeq = grid,stdErrRule = TRUE)

## cross-validation results
repCV$avgCV.res

## cross-validation selected lambda, RAW, and sNV
cv.lambda <- repCV$cv.lambda
cv.RAW <- repCV$cv.RAW
cv.RAW <- repCV$cv.sNB

## End(Not run)

[Package ClinicalUtilityRecal version 0.1.0 Index]