BBC_dichotom {Qindex} | R Documentation |
Bootstrap-based Optimism Correction for Dichotomization
Description
Functions explained in this documentation are,
BBC_dichotom()
-
to obtain a multivariable regression model with bootstrap-based optimism correction on the dichotomized predictors.
optimism_dichotom()
-
a helper function to compute the bootstrap-based optimism of the dichotomized predictors.
coef_dichotom()
-
a helper function to obtain the estimated multivariable regression coefficients of the dichotomized predictors.
Usage
BBC_dichotom(formula, dichotom, data, ...)
optimism_dichotom(formula, X, data, R = 100L, ...)
coef_dichotom(formula, dX, data)
Arguments
formula |
formula,
left-hand-side being the response |
dichotom |
one-sided formula
of the set of predictors to be dichotomized.
These predictors can be stored in |
data |
data.frame, containing the response |
... |
additional parameters, currently not in use |
X |
(for helper function |
R |
positive integer scalar,
number of bootstrap replicates |
dX |
(for helper function |
Details
Function BBC_dichotom()
obtains a multivariable regression model with
bootstrap-based optimism correction on the dichotomized predictors.
Specifically,
Dichotomize the
k
predictors in the entire data (using functionm_rpartD()
). Fit a regression model to the entire data with thek
dichotomized predictors as well as the additional predictors, if any (using helper functioncoef_dichotom()
). The estimated regression model is referred to as the apparent performance.Obtain the bootstrap-based optimism based on
R
copies of bootstrap samples, using optimism_dichotom. Calculate the median of bootstrap-based optimism, specific to each of the dichotomized predictors. In future, we may expand the options to include the use of trimmed-mean mean.default(, trim)
, etc. For now, let's refer to the median optimism as the optimism-correction of thek
dichotomized predictors.
Subtract the optimism-correction (in Step 2)
from the apparent performance estimates (in Step 1),
only for the k
dichotomized predictors.
The apparent performance estimates for the additional predictors, if any,
are not modified.
The variance-covariance (vcov) estimates of the apparent performance
is not modified, for now.
None of the other regression model diagnostics, such as
residuals,
logLikelihood,
etc.,
are modified neither, for now.
The coefficient-only, partially-modified regression model is referred to as
the optimism-corrected performance.
Value
Function BBC_dichotom returns a coxph, glm or lm regression model, with attributes,
attr(,'optimism')
the returned object from optimism_dichotom
attr(,'apparent_cutoff')
a double vector, cutoff thresholds for the
k
predictors in the apparent model
Details of Helper Function optimism_dichotom()
Function optimism_dichotom computes the bootstrap-based optimism
of the dichotomized predictors.
First, R
bootstrap samples are generated,
for which the end-user may specify a Random seed, if needed.
Then,
From each of the
R
bootstrap samples, obtain the dichotomizing branches for thek
predictors to be dichotomized, using functionm_rpartD()
Dichotomize the
k
predictors in each bootstrap sample using the respective dichotomizing branches from Step 1. The regression coefficients estimate for thek
dichotomized predictors (using helper functioncoef_dichotom()
) is referred to as the bootstrap performance estimate.Dichotomize the
k
predictors in the entire data using each of the bootstrap dichotomizing branches from Step 1. The regression coefficients estimate for thek
dichotomized predictors (using helper functioncoef_dichotom()
) is referred to as the test performance estimate.
The difference between the bootstrap and test performance estimates,
based on each of the R
bootstrap samples,
are referred to as the bootstrap-based optimism or optimistic bias.
Details of Helper Function coef_dichotom()
Function coef_dichotom obtains the estimated multivariable regression coefficients of the dichotomized predictors. A Cox proportional hazards (coxph) regression for Surv response, a logistic (glm) regression for logical response, or a linear (lm) regression for gaussian response is performed with
the dichotomous logical predictors, given as the columns of
dX
, andthe additional predictors specified in
formula
When dX
has duplicated columns,
the regression model is fitted using the unique columns of dX
and the
additional predictors in formula
.
The returned coefficient estimates repeat the corresponding estimates of the unique columns of dX
.
Returns of Helper Functions
Helper function optimism_dichotom()
returns an R\times k
double matrix of
bootstrap-based optimism,
with attributes
attr(,'cutoff')
an
R\times k
double matrix, theR
copies of bootstrap cutoff thresholds for thek
predictors. See attribute'cutoff'
of functionm_rpartD()
Helper function coef_dichotom()
returns a double vector of the
coefficients of the dichotomized predictors, with attributes
References on Helper Function optimism_dichotom()
Ewout W. Steyerberg (2009) Clinical Prediction Models. doi:10.1007/978-0-387-77244-8
Frank E. Harrell Jr., Kerry L. Lee, Daniel B. Mark. (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4
Examples
library(survival)
data(flchain, package = 'survival') # see more details from ?survival::flchain
head(flchain2 <- within.data.frame(flchain, expr = {
mgus = as.logical(mgus)
}))
dim(flchain3 <- subset(flchain2, futime > 0)) # required by ?rpart::rpart
dim(flchain_Circulatory <- subset(flchain3, chapter == 'Circulatory'))
m1 = BBC_dichotom(Surv(futime, death) ~ age + sex + mgus,
data = flchain_Circulatory, dichotom = ~ kappa + lambda)
summary(m1)
attr(attr(m1, 'optimism'), 'cutoff')
attr(m1, 'apparent_cutoff')