sMSROC {sMSROC}R Documentation

sMS ROC curve estimator computation

Description

Core function for computing the sMS ROC estimator which fits the estimation of the ROC curve when the outcome of interest is time-dependent (prognosis scenarios) and when it is not (diagnosis scenarios).

Usage

sMSROC(marker, status, observed.time, left, right, time,
       meth, grid, probs, sd.probs,
       conf.int, ci.cl, ci.meth, ci.nboots, parallel, ncpus, all)

Arguments

marker

vector with the biomarker values.

status

numeric response vector. The highest value is assumed to stand for the subjects having the event under study. The lowest one, for those who do not. Any other value will not be considered. It is a mandatory parameter in diagnosis scenarios.

observed.time

vector with the observed times for each subject, for prognosis scenarios under right censorship. Notice that these values may be the event times or the censoring times.

left

vector containing the lower edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations.

right

vector with the upper edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. The infinity is admissible as value (indicated as inf).

time

point of time at which the sMS ROC curve estimator will be computed. The default value is 1.

meth

method for approximating the predictive model P(D|X=x). There are several options available:

  • “E”, allocates to each individual their own condition as positive or negative. Those whose condition is unknown at time time are dismissed.

  • “L”, for Linear logistic regression and proportional hazards regression models (see Details).

  • “S”, for Smooth models (see Details).

probs

vector containing the probabilities corresponding to the predictive model when it has been externally computed. Only values within [0,1] are admissible.

sd.probs

vector with the standard deviations of the probabilities entered in probs. It is an optional parameter.

grid

grid size for computing the AUC. Default value 1000.

conf.int

indicates whethet a conficence interval for the AUC will be computed (“T”) or not (“F”). The default value is (“F”).

ci.cl

confidence level at which the confidence interval for the AUC will be provided. The default value is 95%. This parameter is ignored when conf.int is set to “F”.

ci.meth

method for computing the confidence interval for the AUC. There are three options:

  • “E”, for the Empirical variance estimation.

  • “V”, for the theoretical Variance estimation.

  • “B”, for the Bootstrap percentile approximation.

    The empirical method E is taken as default value and the parameter is ignored too when ci.cl value is “F”.

ci.nboots

number of boostrap samples to be run when Boostrap is set as ci.meth parameter. The default value is 500 and it is not taken into account when no confidence interval is computed.

parallel

indicates whether parallel computing will be done (“T”) or not (“F”) when computing the variance of the AUC through the methods “V” and “B”.

ncpus

number of CPUS that will be used when parallel computing is chosen. The default value is 1 and the maximum is 2.

all

parameter indicating whether all probabilities given by the predictive model should be considered (value “T”) or just those corresponding to individuals whose condition as positive or negative is unknown (“F”). The default value is (“T”).

Details

The Two-stages mixed-subjects (sMSROC) ROC curve estimator links diagnosis and prognosis scenarios through a general predictive model (first stage) and the weighted empirical estimator of the cumulative distribution function of the biomarker (second stage).

The predictive model P(D|X=x) depicts the relationship between the biomarker and the binary response variable. It is approximated through the most suitable probabilistic model.

For diagnosis scenarios:

Notice that the predictive model allows to compute the probability of being positive/negative even when the actual belonging group is unknown.

For prognosis scenarios and right censorship:

Finally, for prognosis scenarios and interval censorship:

The confidence intervals for the AUC can be computed in three different ways according to parameter ci.meth. When it is set to "E" the variance of the AUC is estimated by the empirical procedure and when the chosen option is "V", the theoretical approximation is used (see doi:10.1515/ijb-2019-0097). The third option in by using the Bootstrap percentile.

Value

The ouput is an objetc of class sMSROC with the following components:

thres

vector containing the biomarker values for which sensitivity and specificity were computed.

SE

vector with the estimates of the sensitivity.

SP

vector with the estimates of the specificity.

probs

vector with the probabilities corresponding to the predictive model.

u

vector containing the points between 0 and 1 at which the ROC curve estimator will be computed. Its size is determined by the grid parameter.

ROC

ROC curve approximated at each point of the vector u.

auc

area under sMSROC curve estimator.

auc.ci.l

lower edge of the confidence interval for the AUC.

auc.ci.u

upper edge of the confidence interval for the AUC.

ci.cl

confidence level at which the confidence interval for the AUC were computed.

ci.meth

method chosen for computing the confidence interval for the AUC.

time

point of time at which the sMS ROC curve estimator was computed in prognosis scenarios.

data

list contaning several parameters used in the internal functions, when applicable:

  • data_type - type of scenario handled (diagnosis/prognosis, under right or interval censorship).

  • grid - grid size.

  • marker - vector with the biomarker values.

  • outcome - vector with the condition of the individuals at time time as positive, negative or unknown.

  • ncpus - CPUs used if parallel computing was performed.

  • ci.nboots - number of bootstrap samples generated for computing the confidence intervals for the AUC.

  • parallel - was parallel computing performed?

  • meth - method used to compute the predictive model.

  • status - response vector.

  • observed.time - vector with the observed times for each subject.

  • left - vector with the lower edges of the observed intervals.

  • right - vector with the upper edges of the observed intervals.

message

table containing the warning messages generated during the execution of the function.

References

S. Díaz-Coto, P. Martínez-Camblor, and N. O. Corral-Blanco. Cumulative/dynamic ROC curve estimation under interval censorship. Journal of Statistical Computation and Simulation, 90(9):1570– 1590, 2020. doi:10.1080/00949655.2020.1736071.

S. Díaz-Coto, N. O. Corral-Blanco, and P. Martínez-Camblor. Two-stage receiver operating-characteristic curve estimator for cohort studies. The International Journal of Biostatistics, 17:117–137, 2021. doi:10.1515/ijb-2019-0097.

Finkelstein, Dianne M. A Proportional Hazards Model for Interval-Censored Failure Time Data. Biometrics 42, no. 4 (1986): 845–54. doi:10.2307/2530698.

Durrleman S, Simon R. Flexible regression models with cubic splines. Statistics in Medicine 1989; 8(5): 551-561.doi:10.1002/sim.4780080504

Hurvich C, Simonoff J, Tsai CL. Smoothing parameter selection in nonparametric regression using an improved Akaike 1998. J.R. Statist. Soc. 60 271-293. doi:10.1111/1467-9868.00125

B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. CRC press, 1994.

Examples

data(ktfs)
DT <- ktfs
sROC <- sMSROC(marker = DT$score, status = DT$failure,
               observed.time = DT$time, time = 5, meth = "L", conf.int = "T",
               ci.cl =0.90, ci.meth = "E")

[Package sMSROC version 0.1.2 Index]