nonlinComb {dtComb} | R Documentation |
Combine two diagnostic tests with several non-linear combination methods.
Description
The nonlinComb
function calculates the combination
scores of two diagnostic tests selected among several non-linear combination
methods and standardization options
Usage
nonlinComb(
markers = NULL,
status = NULL,
event = NULL,
method = c("polyreg", "ridgereg", "lassoreg", "elasticreg", "splines", "sgam", "nsgam"),
degree1 = 3,
degree2 = 3,
df1 = 4,
df2 = 4,
resample = c("none", "cv", "repeatedcv", "boot"),
nfolds = 5,
nrepeats = 3,
niters = 10,
standardize = c("none", "range", "zScore", "tScore", "mean", "deviance"),
include.interact = FALSE,
alpha = 0.5,
show.plot = TRUE,
direction = c("auto", "<", ">"),
conf.level = 0.95,
cutoff.method = c("CB", "MCT", "MinValueSp", "MinValueSe", "ValueSp", "ValueSe",
"MinValueSpSe", "MaxSp", "MaxSe", "MaxSpSe", "MaxProdSpSe", "ROC01", "SpEqualSe",
"Youden", "MaxEfficiency", "Minimax", "MaxDOR", "MaxKappa", "MinValueNPV",
"MinValuePPV", "ValueNPV", "ValuePPV", "MinValueNPVPPV", "PROC01", "NPVEqualPPV",
"MaxNPVPPV", "MaxSumNPVPPV", "MaxProdNPVPPV", "ValueDLR.Negative",
"ValueDLR.Positive", "MinPvalue", "ObservedPrev", "MeanPrev", "PrevalenceMatching"),
...
)
Arguments
markers |
a numeric data frame that includes two diagnostic tests
results
|
status |
a factor vector that includes the actual disease
status of the patients
|
event |
a character string that indicates the event in the status
to be considered as positive event
|
method |
a character string specifying the method used for
combining the markers. The available methods are:
-
Logistic Regression with Polynomial Feature Space (polyreg) : The method
builds a logistic regression model with the polynomial feature space and returns the probability
of a positive event for each observation.
-
Ridge Regression with Polynomial Feature Space (ridgereg) : Ridge regression is a
shrinkage method used to estimate the coefficients of highly correlated variables and in this case
the polynomial feature space created from two markers. For the implementation of the method,
glmnet() library is used with two functions: cv.glmnet() to run a cross
validation model to determine the tuning parameter \lambda and glmnet() to fit the
model with the selected tuning parameter. For Ridge regression,
the glmnet() package is integrated into the dtComb package to facilitate the implementation
of this method.
-
Lasso Regression with Polynomial Feature Space (lassoreg) : Lasso regression,
like Ridge regression, is a type of shrinkage method. However, a notable difference is that
Lasso tends to set some feature coefficients to zero, making it useful for feature elimination.
It also employs cross-validation for parameter selection and model fitting using the glmnet library.
-
Elastic Net Regression with Polynomial Feature Space (elasticreg) : Elastic Net
regression is a hybrid model that merges the penalties from Ridge and Lasso regression, aiming
to leverage the strengths of both approaches. This model involves two parameters: \lambda ,
similar to Ridge and Lasso, and \alpha , a user-defined mixing parameter ranging between 0 (representing Ridge)
and 1 (representing Lasso). The \alpha parameter determines the balance or weights between the loss functions
of Ridge and Lasso regressions.
-
Splines (splines) : Another non-linear approach to combine markers
involves employing regression models within a polynomial feature space. This approach
applies multiple regression models to the dataset using a function derived from
piecewise polynomials. This implementation uses splines with user-defined degrees
of freedom and degrees for the fitted polynomials. The splines library
is employed to construct piecewise logistic regression models using base splines.
-
Generalized Additive Models with Smoothing Splines and Generalized Additive Models
with Natural Cubic Splines (sgam & nsgam) : In addition to the basic spline structure,
Generalized Additive Models are applied with natural cubic splines and smoothing splines
using the gam library in R.
|
degree1 |
a numeric value for polynomial based methods indicates
the degree of the feature space created for marker 1, for spline based
methods the degree of the fitted polynomial between each node for marker 1.
(3, default)
|
degree2 |
a numeric value for polynomial based methods indicates
the degree of the feature space created for marker 2, for spline based
methods the degree of the fitted polynomial between each node for marker 2
(3, default)
|
df1 |
a numeric value that indicates the number of knots as the
degrees of freedom in spline based methods for marker 1 (4, default)
|
df2 |
a numeric value that indicates the number of knots as the
degrees of freedom in spline based methods for marker 2 (4, default)
|
resample |
a character string indicating the name of the
resampling options. Bootstrapping Cross-validation and repeated cross-validation
are given as the options for resampling, along with the number
of folds and number of repeats.
-
boot : Bootstrapping is performed similarly; the dataset
is divided into folds with replacement and models are trained and tested
in these folds to determine the best parameters for the given method and
dataset.
-
cv : Cross-validation resampling, the dataset is divided into the
number of folds given without replacement; in each iteration, one fold is
selected as the test set, and the model is built using the remaining folds
and tested on the test set. The corresponding AUC values and the parameters
used for the combination are kept in a list. The best-performed model is
selected, and the combination score is returned for the whole dataset.
-
repeatedcv : Repeated cross-validation the process is repeated,
and the best-performed models selected at each step are stored in another
list; the best performed among these models is selected to be applied to
the entire dataset.
|
nfolds |
a numeric value that indicates the number of folds for
cross validation based resampling methods (5, default)
|
nrepeats |
a numeric value that indicates the number of repeats
for "repeatedcv" option of resampling methods (3, default)
|
niters |
a numeric value that indicates the number of
bootstrapped resampling iterations (10, default)
|
standardize |
a character string indicating the name of the
standardization method. The default option is no standardization applied.
Available options are:
-
Z-score (zScore) : This method scales the data to have a mean
of 0 and a standard deviation of 1. It subtracts the mean and divides by the standard
deviation for each feature. Mathematically,
Z-score = \frac{x - (\overline x)}{sd(x)}
where x is the value of a marker, \overline{x} is the mean of the marker and sd(x) is the standard deviation of the marker.
-
T-score (tScore) : T-score is commonly used
in data analysis to transform raw scores into a standardized form.
The standard formula for converting a raw score x into a T-score is:
T-score = \Biggl(\frac{x - (\overline x)}{sd(x)}\times 10 \Biggl) +50
where x is the value of a marker, \overline{x} is the mean of the marker
and sd(x) is the standard deviation of the marker.
-
Range (a.k.a. min-max scaling) (range) : This method transforms data to
a specific range, between 0 and 1. The formula for this method is:
Range = \frac{x - min(x)}{max(x) - min(x)}
-
Mean (mean) : This method, which helps
to understand the relative size of a single observation concerning
the mean of dataset, calculates the ratio of each data point to the mean value
of the dataset.
Mean = \frac{x}{\overline{x}}
where x is the value of a marker and \overline{x} is the mean of the marker.
-
Deviance (deviance) : This method, which allows for
comparison of individual data points in relation to the overall spread of
the data, calculates the ratio of each data point to the standard deviation
of the dataset.
Deviance = \frac{x}{sd(x)}
where x is the value of a marker and sd(x) is the standard deviation of the marker.
|
include.interact |
a logical indicator that specifies whether to
include the interaction between the markers to the feature space created for
polynomial based methods (FALSE, default)
|
alpha |
a numeric value as the mixing parameter in Elastic Net
Regression method (0.5, default)
|
show.plot |
a logical a logical . If TRUE, a ROC curve is
plotted. Default is TRUE
|
direction |
a character string determines in which direction the
comparison will be made. ">": if the predictor values for the control group
are higher than the values of the case group (controls > cases).
"<": if the predictor values for the control group are lower or equal than
the values of the case group (controls < cases).
|
conf.level |
a numeric values determines the confidence interval
for the ROC curve(0.95, default).
|
cutoff.method |
a character string determines the cutoff method
for the ROC curve.
|
... |
further arguments. Currently has no effect on the results.
|
Value
A list of numeric
nonlinear combination scores calculated
according to the given method and standardization option
Author(s)
Serra Ilayda Yerlitas, Serra Bersan Gengec, Necla Kochan,
Gozde Erturk Zararsiz, Selcuk Korkmaz, Gokmen Zararsiz
Examples
data("exampleData1")
data <- exampleData1
markers <- data[, -1]
status <- factor(data$group, levels = c("not_needed", "needed"))
event <- "needed"
cutoff.method <- "Youden"
score1 <- nonlinComb(
markers = markers, status = status, event = event,
method = "lassoreg", include.interact = FALSE, resample = "boot", niters = 5,
degree1 = 4, degree2 = 4, cutoff.method = cutoff.method,
direction = "<"
)
score2 <- nonlinComb(
markers = markers, status = status, event = event,
method = "splines", resample = "none", cutoff.method = cutoff.method,
standardize = "tScore", direction = "<"
)
score3 <- nonlinComb(
markers = markers, status = status, event = event,
method = "lassoreg", resample = "repeatedcv", include.interact = TRUE,
cutoff.method = "ROC01", standardize = "zScore", direction = "auto"
)
[Package
dtComb version 1.0.2
Index]