R: Bootstrap-based Optimism Correction for Dichotomization

BBC_dichotom {Qindex}

R Documentation

Bootstrap-based Optimism Correction for Dichotomization

Description

Functions explained in this documentation are,

BBC_dichotom(): to obtain a multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors.
optimism_dichotom(): a helper function to compute the bootstrap-based optimism of the dichotomized predictors.
coef_dichotom(): a helper function to obtain the estimated multivariable regression coefficients of the dichotomized predictors.

Usage

BBC_dichotom(formula, dichotom, data, ...)

optimism_dichotom(formula, X, data, R = 100L, ...)

coef_dichotom(formula, dX, data)

Arguments

`formula`	formula, left-hand-side being the response `y` and right-hand-side being the predictors in addition to the predictors to be `dichotom`ized. If there is no additional predictor, use `y ~ 1`
`dichotom`	one-sided formula of the set of predictors to be dichotomized. These predictors can be stored in `data` as one or more numeric columns and/or one matrix column
`data`	data.frame, containing the response `y` and predictors in `formula`, as well as the predictors to be `dichotom`ized
`...`	additional parameters, currently not in use
`X`	(for helper function `optimism_dichotom()`) numeric matrix of `k` columns, a set of `k` numeric predictors
`R`	positive integer scalar, number of bootstrap replicates `R`, default `100L`
`dX`	(for helper function `coef_dichotom()`) logical matrix of `k` columns, a set of `k` dichotomized predictors

Details

Function BBC_dichotom() obtains a multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors. Specifically,

Dichotomize the k predictors in the entire data (using function m_rpartD()). Fit a regression model to the entire data with the k dichotomized predictors as well as the additional predictors, if any (using helper function coef_dichotom()). The estimated regression model is referred to as the apparent performance.
Obtain the bootstrap-based optimism based on R copies of bootstrap samples, using optimism_dichotom. Calculate the median of bootstrap-based optimism, specific to each of the dichotomized predictors. In future, we may expand the options to include the use of trimmed-mean mean.default⁠(, trim)⁠, etc. For now, let's refer to the median optimism as the optimism-correction of the k dichotomized predictors.

Subtract the optimism-correction (in Step 2) from the apparent performance estimates (in Step 1), only for the k dichotomized predictors. The apparent performance estimates for the additional predictors, if any, are not modified. The variance-covariance (vcov) estimates of the apparent performance is not modified, for now. None of the other regression model diagnostics, such as residuals, logLikelihood, etc., are modified neither, for now. The coefficient-only, partially-modified regression model is referred to as the optimism-corrected performance.

Value

Function BBC_dichotom returns a coxph, glm or lm regression model, with attributes,

attr(,'optimism'): the returned object from optimism_dichotom
attr(,'apparent_cutoff'): a double vector, cutoff thresholds for the k predictors in the apparent model

Details of Helper Function `optimism_dichotom()`

Function optimism_dichotom computes the bootstrap-based optimism of the dichotomized predictors. First, R bootstrap samples are generated, for which the end-user may specify a Random seed, if needed. Then,

From each of the R bootstrap samples, obtain the dichotomizing branches for the k predictors to be dichotomized, using function m_rpartD()
Dichotomize the k predictors in each bootstrap sample using the respective dichotomizing branches from Step 1. The regression coefficients estimate for the k dichotomized predictors (using helper function coef_dichotom()) is referred to as the bootstrap performance estimate.
Dichotomize the k predictors in the entire data using each of the bootstrap dichotomizing branches from Step 1. The regression coefficients estimate for the k dichotomized predictors (using helper function coef_dichotom()) is referred to as the test performance estimate.

The difference between the bootstrap and test performance estimates, based on each of the R bootstrap samples, are referred to as the bootstrap-based optimism or optimistic bias.

Details of Helper Function `coef_dichotom()`

Function coef_dichotom obtains the estimated multivariable regression coefficients of the dichotomized predictors. A Cox proportional hazards (coxph) regression for Surv response, a logistic (glm) regression for logical response, or a linear (lm) regression for gaussian response is performed with

the dichotomous logical predictors, given as the columns of dX, and
the additional predictors specified in formula

When dX has duplicated columns, the regression model is fitted using the unique columns of dX and the additional predictors in formula. The returned coefficient estimates repeat the corresponding estimates of the unique columns of dX.

Returns of Helper Functions

Helper function optimism_dichotom() returns an R\times k double matrix of bootstrap-based optimism, with attributes

attr(,'cutoff'): an R\times k double matrix, the R copies of bootstrap cutoff thresholds for the k predictors. See attribute 'cutoff' of function m_rpartD()

Helper function coef_dichotom() returns a double vector of the coefficients of the dichotomized predictors, with attributes

attr(,'model'): the coxph, glm or lm regression model

References on Helper Function `optimism_dichotom()`

Ewout W. Steyerberg (2009) Clinical Prediction Models. doi:10.1007/978-0-387-77244-8

Frank E. Harrell Jr., Kerry L. Lee, Daniel B. Mark. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4

Examples

library(survival)
data(flchain, package = 'survival') # see more details from ?survival::flchain
head(flchain2 <- within.data.frame(flchain, expr = {
  mgus = as.logical(mgus)
}))
dim(flchain3 <- subset(flchain2, futime > 0)) # required by ?rpart::rpart
dim(flchain_Circulatory <- subset(flchain3, chapter == 'Circulatory'))

m1 = BBC_dichotom(Surv(futime, death) ~ age + sex + mgus, 
 data = flchain_Circulatory, dichotom = ~ kappa + lambda)
summary(m1)
attr(attr(m1, 'optimism'), 'cutoff')
attr(m1, 'apparent_cutoff')

[Package Qindex version 0.1.5 Index]