nonlinComb {dtComb}R Documentation

Combine two diagnostic tests with several non-linear combination methods.

Description

The nonlinComb function calculates the combination scores of two diagnostic tests selected among several non-linear combination methods and standardization options

Usage

nonlinComb(
  markers = NULL,
  status = NULL,
  event = NULL,
  method = c("polyreg", "ridgereg", "lassoreg", "elasticreg", "splines", "sgam", "nsgam"),
  degree1 = 3,
  degree2 = 3,
  df1 = 4,
  df2 = 4,
  resample = c("none", "cv", "repeatedcv", "boot"),
  nfolds = 5,
  nrepeats = 3,
  niters = 10,
  standardize = c("none", "range", "zScore", "tScore", "mean", "deviance"),
  include.interact = FALSE,
  alpha = 0.5,
  show.plot = TRUE,
  direction = c("auto", "<", ">"),
  conf.level = 0.95,
  cutoff.method = c("CB", "MCT", "MinValueSp", "MinValueSe", "ValueSp", "ValueSe",
    "MinValueSpSe", "MaxSp", "MaxSe", "MaxSpSe", "MaxProdSpSe", "ROC01", "SpEqualSe",
    "Youden", "MaxEfficiency", "Minimax", "MaxDOR", "MaxKappa", "MinValueNPV",
    "MinValuePPV", "ValueNPV", "ValuePPV", "MinValueNPVPPV", "PROC01", "NPVEqualPPV",
    "MaxNPVPPV", "MaxSumNPVPPV", "MaxProdNPVPPV", "ValueDLR.Negative",
    "ValueDLR.Positive", "MinPvalue", "ObservedPrev", "MeanPrev", "PrevalenceMatching"),
  ...
)

Arguments

markers

a numeric data frame that includes two diagnostic tests results

status

a factor vector that includes the actual disease status of the patients

event

a character string that indicates the event in the status to be considered as positive event

method

a character string specifying the method used for combining the markers. The available methods are:

  • Logistic Regression with Polynomial Feature Space (polyreg): The method builds a logistic regression model with the polynomial feature space and returns the probability of a positive event for each observation.

  • Ridge Regression with Polynomial Feature Space (ridgereg): Ridge regression is a shrinkage method used to estimate the coefficients of highly correlated variables and in this case the polynomial feature space created from two markers. For the implementation of the method, glmnet() library is used with two functions: cv.glmnet() to run a cross validation model to determine the tuning parameter \lambda and glmnet() to fit the model with the selected tuning parameter. For Ridge regression, the glmnet() package is integrated into the dtComb package to facilitate the implementation of this method.

  • Lasso Regression with Polynomial Feature Space (lassoreg): Lasso regression, like Ridge regression, is a type of shrinkage method. However, a notable difference is that Lasso tends to set some feature coefficients to zero, making it useful for feature elimination. It also employs cross-validation for parameter selection and model fitting using the glmnet library.

  • Elastic Net Regression with Polynomial Feature Space (elasticreg): Elastic Net regression is a hybrid model that merges the penalties from Ridge and Lasso regression, aiming to leverage the strengths of both approaches. This model involves two parameters: \lambda, similar to Ridge and Lasso, and \alpha, a user-defined mixing parameter ranging between 0 (representing Ridge) and 1 (representing Lasso). The \alpha parameter determines the balance or weights between the loss functions of Ridge and Lasso regressions.

  • Splines (splines): Another non-linear approach to combine markers involves employing regression models within a polynomial feature space. This approach applies multiple regression models to the dataset using a function derived from piecewise polynomials. This implementation uses splines with user-defined degrees of freedom and degrees for the fitted polynomials. The splines library is employed to construct piecewise logistic regression models using base splines.

  • Generalized Additive Models with Smoothing Splines and Generalized Additive Models with Natural Cubic Splines (sgam & nsgam): In addition to the basic spline structure, Generalized Additive Models are applied with natural cubic splines and smoothing splines using the gam library in R.

degree1

a numeric value for polynomial based methods indicates the degree of the feature space created for marker 1, for spline based methods the degree of the fitted polynomial between each node for marker 1. (3, default)

degree2

a numeric value for polynomial based methods indicates the degree of the feature space created for marker 2, for spline based methods the degree of the fitted polynomial between each node for marker 2 (3, default)

df1

a numeric value that indicates the number of knots as the degrees of freedom in spline based methods for marker 1 (4, default)

df2

a numeric value that indicates the number of knots as the degrees of freedom in spline based methods for marker 2 (4, default)

resample

a character string indicating the name of the resampling options. Bootstrapping Cross-validation and repeated cross-validation are given as the options for resampling, along with the number of folds and number of repeats.

  • boot: Bootstrapping is performed similarly; the dataset is divided into folds with replacement and models are trained and tested in these folds to determine the best parameters for the given method and dataset.

  • cv: Cross-validation resampling, the dataset is divided into the number of folds given without replacement; in each iteration, one fold is selected as the test set, and the model is built using the remaining folds and tested on the test set. The corresponding AUC values and the parameters used for the combination are kept in a list. The best-performed model is selected, and the combination score is returned for the whole dataset.

  • repeatedcv: Repeated cross-validation the process is repeated, and the best-performed models selected at each step are stored in another list; the best performed among these models is selected to be applied to the entire dataset.

nfolds

a numeric value that indicates the number of folds for cross validation based resampling methods (5, default)

nrepeats

a numeric value that indicates the number of repeats for "repeatedcv" option of resampling methods (3, default)

niters

a numeric value that indicates the number of bootstrapped resampling iterations (10, default)

standardize

a character string indicating the name of the standardization method. The default option is no standardization applied. Available options are:

  • Z-score (zScore): This method scales the data to have a mean of 0 and a standard deviation of 1. It subtracts the mean and divides by the standard deviation for each feature. Mathematically,

    Z-score = \frac{x - (\overline x)}{sd(x)}

    where x is the value of a marker, \overline{x} is the mean of the marker and sd(x) is the standard deviation of the marker.

  • T-score (tScore): T-score is commonly used in data analysis to transform raw scores into a standardized form. The standard formula for converting a raw score x into a T-score is:

    T-score = \Biggl(\frac{x - (\overline x)}{sd(x)}\times 10 \Biggl) +50

    where x is the value of a marker, \overline{x} is the mean of the marker and sd(x) is the standard deviation of the marker.

  • Range (a.k.a. min-max scaling) (range): This method transforms data to a specific range, between 0 and 1. The formula for this method is:

    Range = \frac{x - min(x)}{max(x) - min(x)}

  • Mean (mean): This method, which helps to understand the relative size of a single observation concerning the mean of dataset, calculates the ratio of each data point to the mean value of the dataset.

    Mean = \frac{x}{\overline{x}}

    where x is the value of a marker and \overline{x} is the mean of the marker.

  • Deviance (deviance): This method, which allows for comparison of individual data points in relation to the overall spread of the data, calculates the ratio of each data point to the standard deviation of the dataset.

    Deviance = \frac{x}{sd(x)}

    where x is the value of a marker and sd(x) is the standard deviation of the marker.

include.interact

a logical indicator that specifies whether to include the interaction between the markers to the feature space created for polynomial based methods (FALSE, default)

alpha

a numeric value as the mixing parameter in Elastic Net Regression method (0.5, default)

show.plot

a logical a logical. If TRUE, a ROC curve is plotted. Default is TRUE

direction

a character string determines in which direction the comparison will be made. ">": if the predictor values for the control group are higher than the values of the case group (controls > cases). "<": if the predictor values for the control group are lower or equal than the values of the case group (controls < cases).

conf.level

a numeric values determines the confidence interval for the ROC curve(0.95, default).

cutoff.method

a character string determines the cutoff method for the ROC curve.

...

further arguments. Currently has no effect on the results.

Value

A list of numeric nonlinear combination scores calculated according to the given method and standardization option

Author(s)

Serra Ilayda Yerlitas, Serra Bersan Gengec, Necla Kochan, Gozde Erturk Zararsiz, Selcuk Korkmaz, Gokmen Zararsiz

Examples

data("exampleData1")
data <- exampleData1

markers <- data[, -1]
status <- factor(data$group, levels = c("not_needed", "needed"))
event <- "needed"
cutoff.method <- "Youden"

score1 <- nonlinComb(
  markers = markers, status = status, event = event,
  method = "lassoreg", include.interact = FALSE, resample = "boot", niters = 5,
  degree1 = 4, degree2 = 4, cutoff.method = cutoff.method,
  direction = "<"
)

score2 <- nonlinComb(
  markers = markers, status = status, event = event,
  method = "splines", resample = "none", cutoff.method = cutoff.method,
  standardize = "tScore", direction = "<"
)

score3 <- nonlinComb(
  markers = markers, status = status, event = event,
  method = "lassoreg", resample = "repeatedcv", include.interact = TRUE,
  cutoff.method = "ROC01", standardize = "zScore", direction = "auto"
)


[Package dtComb version 1.0.2 Index]