R: sMS ROC curve estimator computation

sMSROC {sMSROC}

R Documentation

sMS ROC curve estimator computation

Description

Core function for computing the sMS ROC estimator which fits the estimation of the ROC curve when the outcome of interest is time-dependent (prognosis scenarios) and when it is not (diagnosis scenarios).

Usage

sMSROC(marker, status, observed.time, left, right, time,
       meth, grid, probs, sd.probs,
       conf.int, ci.cl, ci.meth, ci.nboots, parallel, ncpus, all)

Arguments

`marker`	vector with the biomarker values.
`status`	numeric response vector. The highest value is assumed to stand for the subjects having the event under study. The lowest one, for those who do not. Any other value will not be considered. It is a mandatory parameter in diagnosis scenarios.
`observed.time`	vector with the observed times for each subject, for prognosis scenarios under right censorship. Notice that these values may be the event times or the censoring times.
`left`	vector containing the lower edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations.
`right`	vector with the upper edges of the observed intervals. It is mandatory in prognosis scenarios under interval censorship and ignored in other situations. The infinity is admissible as value (indicated as inf).
`time`	point of time at which the sMS ROC curve estimator will be computed. The default value is 1.
`meth`	method for approximating the predictive model `P(D\|X=x)`. There are several options available: “E”, allocates to each individual their own condition as positive or negative. Those whose condition is unknown at time `time` are dismissed. “L”, for Linear logistic regression and proportional hazards regression models (see Details). “S”, for Smooth models (see Details).
`probs`	vector containing the probabilities corresponding to the predictive model when it has been externally computed. Only values within [0,1] are admissible.
`sd.probs`	vector with the standard deviations of the probabilities entered in `probs`. It is an optional parameter.
`grid`	grid size for computing the AUC. Default value 1000.
`conf.int`	indicates whethet a conficence interval for the AUC will be computed (“T”) or not (“F”). The default value is (“F”).
`ci.cl`	confidence level at which the confidence interval for the AUC will be provided. The default value is 95%. This parameter is ignored when `conf.int` is set to “F”.
`ci.meth`	method for computing the confidence interval for the AUC. There are three options: “E”, for the Empirical variance estimation. “V”, for the theoretical Variance estimation. “B”, for the Bootstrap percentile approximation. The empirical method E is taken as default value and the parameter is ignored too when `ci.cl` value is “F”.
`ci.nboots`	number of boostrap samples to be run when Boostrap is set as `ci.meth` parameter. The default value is 500 and it is not taken into account when no confidence interval is computed.
`parallel`	indicates whether parallel computing will be done (“T”) or not (“F”) when computing the variance of the AUC through the methods “V” and “B”.
`ncpus`	number of CPUS that will be used when parallel computing is chosen. The default value is 1 and the maximum is 2.
`all`	parameter indicating whether all probabilities given by the predictive model should be considered (value “T”) or just those corresponding to individuals whose condition as positive or negative is unknown (“F”). The default value is (“T”).

Details

The Two-stages mixed-subjects (sMSROC) ROC curve estimator links diagnosis and prognosis scenarios through a general predictive model (first stage) and the weighted empirical estimator of the cumulative distribution function of the biomarker (second stage).

The predictive model P(D|X=x) depicts the relationship between the biomarker and the binary response variable. It is approximated through the most suitable probabilistic model.

For diagnosis scenarios:

If meth = “L”, the logit transformation of the predicitive model is approximated by a linear logistic regression model:

P (D|X=x) = 1/(1 + \exp{- \{ \beta_0 + \beta_1 x \}),}

with \beta_0, \beta_1 \in {\cal R}.
If meth = “S”, the logit transformation of the predicitive model is estimated by the smooth logistic regression,

P(D | X=x) = 1 / ( 1 + \exp \{ - s(x) \}),

being s(\cdot) the smooth function (splines, doi:10.1002/sim.4780080504).

Notice that the predictive model allows to compute the probability of being positive/negative even when the actual belonging group is unknown.

For prognosis scenarios and right censorship:

If meth = “L”, the event times are assumed to come from a Cox proportional hazards regression model:

P (T \leq t \;|\; X=x) = 1 - \exp \{ - \Delta_0(t) \cdot \exp \{ \beta_0 + \beta_1 \cdot \log(x)\}\},

where \Delta_0(\cdot) is the baseline hazard function and \beta_0, \beta_1 \in {\cal R}.
If meth = “S”, the approximation is done by

P (T\leq t \;|\; X=x) = 1 - \exp \{ - \Delta_0(t) \cdot \exp \{ s(x)\}\}

being s(\cdot) the smooth function (penalized splines, doi:10.1111/1467-9868.00125).

Finally, for prognosis scenarios and interval censorship:

If meth = “L”, the event times are assumed to come from a Cox proportional hazards regression model and the predictive model is estimated as indicated in doi:10.1080/00949655.2020.1736071.

P (T \leq t \;|\; X=x) = \frac{S(U|x) - S(t|x) }{S(U|x) - S(V|x)},

where U = \min{\{t, L\}} and V = \max {\{t, R\}}, being L and R the random variables that stand for the edges of the observable interval containing the event time.
If meth = “S”, the approximation is done by

P (T\leq t \;|\; X=x) = 1 - S(t|x),

being S(\cdot) the survival function at time t given the marker value, estimated through a proportional hazard model for interval censored data according to doi:10.2307/2530698.

The confidence intervals for the AUC can be computed in three different ways according to parameter ci.meth. When it is set to "E" the variance of the AUC is estimated by the empirical procedure and when the chosen option is "V", the theoretical approximation is used (see doi:10.1515/ijb-2019-0097). The third option in by using the Bootstrap percentile.

Value

The ouput is an objetc of class sMSROC with the following components:

`thres`	vector containing the biomarker values for which sensitivity and specificity were computed.
`SE`	vector with the estimates of the sensitivity.
`SP`	vector with the estimates of the specificity.
`probs`	vector with the probabilities corresponding to the predictive model.
`u`	vector containing the points between 0 and 1 at which the ROC curve estimator will be computed. Its size is determined by the `grid` parameter.
`ROC`	ROC curve approximated at each point of the vector `u`.
`auc`	area under sMSROC curve estimator.
`auc.ci.l`	lower edge of the confidence interval for the AUC.
`auc.ci.u`	upper edge of the confidence interval for the AUC.
`ci.cl`	confidence level at which the confidence interval for the AUC were computed.
`ci.meth`	method chosen for computing the confidence interval for the AUC.
`time`	point of time at which the sMS ROC curve estimator was computed in prognosis scenarios.
`data`	list contaning several parameters used in the internal functions, when applicable: data_type - type of scenario handled (diagnosis/prognosis, under right or interval censorship). grid - grid size. marker - vector with the biomarker values. outcome - vector with the condition of the individuals at time `time` as positive, negative or unknown. ncpus - CPUs used if parallel computing was performed. ci.nboots - number of bootstrap samples generated for computing the confidence intervals for the AUC. parallel - was parallel computing performed? meth - method used to compute the predictive model. status - response vector. observed.time - vector with the observed times for each subject. left - vector with the lower edges of the observed intervals. right - vector with the upper edges of the observed intervals.
`message`	table containing the warning messages generated during the execution of the function.

References

S. Díaz-Coto, P. Martínez-Camblor, and N. O. Corral-Blanco. Cumulative/dynamic ROC curve estimation under interval censorship. Journal of Statistical Computation and Simulation, 90(9):1570– 1590, 2020. doi:10.1080/00949655.2020.1736071.

S. Díaz-Coto, N. O. Corral-Blanco, and P. Martínez-Camblor. Two-stage receiver operating-characteristic curve estimator for cohort studies. The International Journal of Biostatistics, 17:117–137, 2021. doi:10.1515/ijb-2019-0097.

Finkelstein, Dianne M. A Proportional Hazards Model for Interval-Censored Failure Time Data. Biometrics 42, no. 4 (1986): 845–54. doi:10.2307/2530698.

Durrleman S, Simon R. Flexible regression models with cubic splines. Statistics in Medicine 1989; 8(5): 551-561.doi:10.1002/sim.4780080504

Hurvich C, Simonoff J, Tsai CL. Smoothing parameter selection in nonparametric regression using an improved Akaike 1998. J.R. Statist. Soc. 60 271-293. doi:10.1111/1467-9868.00125

B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. CRC press, 1994.

Examples

data(ktfs)
DT <- ktfs
sROC <- sMSROC(marker = DT$score, status = DT$failure,
               observed.time = DT$time, time = 5, meth = "L", conf.int = "T",
               ci.cl =0.90, ci.meth = "E")

[Package sMSROC version 0.1.2 Index]