Unified ROC Curve Estimator for Diagnosis and Prognosis Studies: The sMSROC Package

The binary classification problem is a hot topic in Statistics. Its close relationship with the diagnosis and the prognosis of diseases makes it crucial in biomedical research. In this context, it is important to identify biomarkers that may help to classify individuals into different classes, for example, diseased vs. not diseased. The Receiver Operating-Characteristic (ROC) curve is a graphical tool commonly used to assess the accuracy of such classification. Given the diverse nature of diagnosis and prognosis problems, the ROC curve estimation has been tackled from separate perspectives in each setting. The Two-stages Mixed-Subjects (sMS) ROC curve estimator fits both scenarios. Besides, it can handle data with missing or incomplete outcome values. This paper introduces the R package sMSROC which implements the sMS ROC estimator, and includes tools that may support researchers in their decision making. Its practical application is illustrated on three real-world datasets.

Susana Díaz-Coto (Department of Orthopaedics, Dartmouth Health, Lebanon, NH, USA) , Pablo Martínez-Camblor (Faculty of Health Sciences, Universidad Autonoma de Chile, Chile) , Norberto Corral-Blanco (Department of Statistics, Operational Research and Mathematics Didactics, University of Oviedo, Oviedo (Asturias), Spain)
2024-04-11

0.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2023-087.zip

C. Anderson-Bergman. icenReg: Regression models for interval censored data in R. Journal of Statistical Software, 81(12): 1–23, 2017. URL https://doi.org/10.18637/jss.v081.i12.
N. N. Basu, S. Ingham, J. Hodson, F. Lalloo, M. Bulman, A. Howell and D. G. Evans. Risk of contralateral breast cancer in BRCA1 and BRCA2 mutation carriers: A 30-year semi-prospective analysis. Familial Cancer, 14(4): 531–538, 2015. URL https://doi.org/10.1007/s10689-015-9825-9.
K. M. Beyene and A. El Ghouch. cenROC: Estimating time-dependent ROC curve and AUC for censored data. 2023. URL https://CRAN.R-project.org/package=cenROC. R package version 2.0.0.
K. M. Beyene and A. El Grouch. Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine, 39(24): 3373–3396, 2020. URL https://doi.org/10.1002/sim.8671.
P. Blanche, J. F. Dartigues and H. Jacqmin-Gadda. Review and comparison of ROC curve estimators for a time-dependent outcome with marker-dependent censoring. Biometrical Journal, 55(5): 687–704, 2013a. URL https://doi.org/10.1002/bimj.201200045.
P. Blanche, J.-F. Dartigues and H. Jacqmin-Gadda. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in Medicine, 32(30): 5381–5397, 2013b. URL https://doi.org/10.1002/sim.5958.
L. Chambles and G. Diao. Estimation of time-dependent area under ROC curve for long-term risk prediction. Statistics in Medicine, 20(25): 3474–3486, 2006. URL https://doi.org/10.1002/sim.2299.
D. R. Cox. Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2): 187–220, 1972. URL https://doi.org/10.1111/j.2517-6161.1972.tb00899.x.
S. Díaz-Coto, N. O. Corral-Blaco and P. Martínez-Camblor. Two-stage receiver operating-characteristic curve estimator for cohort studies. The International Journal of Biostatistics, 17: 117–137, 2021. URL https://doi.org/10.1515/ijb-2019-0097.
S. Díaz-Coto, P. Martínez-Camblor and N. O. Corral-Blanco. Cumulative/dynamic ROC curve estimation under interval censorship. Journal of Statistical Computation and Simulation, 90(9): 1570–1590, 2020a. URL https://doi.org/10.1080/00949655.2020.1736071.
S. Díaz-Coto, P. Martínez-Camblor and S. Pérez-Fernández. SmoothROCtime: An R package for time-dependent ROC curve estimation. Computational Statistics, 2020b. URL https://doi.org/10.1007/s00180-020-00955-7.
J. I. Epstein, L. Egevad, M. B. Amin, B. Delahunt, J. R. Srigley, P. A. Humphrey and the-Grading-Comittee. The 2014 international society of urological pathology (ISUP) consensus conference on gleason grading of prostatic carcinoma. The American Journal of Surgical Pathology, 40(2): 244–252, 2016. URL https://doi.org/10.1097/PAS.0000000000000530.
R. Etzioni, M. Pepe, G. Longton, C. Hu and G. Goodman. Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancer. Medical Decision Making, 19(3): 242–251, 1999. URL https://doi.org/10.1177/0272989X9901900303.
P. M. Farrell, B. J. Rosenstein, T. B. White, F. J. Accurso, C. Castellani, G. R. Cutting, P. R. Durie, V. A. LeGrys, J. Massie, R. B. Parad, et al. Guidelines for diagnosis of cystic fibrosis in newborns through older adults: Cystic fibrosis foundation consensus report. The Journal of Pediatrics, 153(2): S4–S14, 2008. URL https://doi.org/10.1016/j.jpeds.2008.05.005.
E. R. Ferreirós, C. P. Boissonnet, R. Pizarro, P. F. Merletti, G. Corrado, A. Cagide and O. O. Bazzino. Independent prognostic value of elevated C-reactive protein in unstable angina. Circulation, 100(19): 1958–1963, 1999. URL https://doi.org/10.1161/01.CIR.100.19.1958.
D. M. Finkelstein. A proportional hazards model for interval-censored failure time data. Biometrics., 42(4): 845–854, 1986. URL https://doi.org/10.2307/2530698.
Y. Foucher, P. Daguin, A. Akl, M. Kessler, M. Ladrière, C. Legendre, H. Kreis, L. Rostaing, N. Kamar, G. Mourad, et al. A clinical scoring system highly predictive of long-term kidney graft survival. Kidney International, 78(12): 1288–1294, 2010. URL https://doi.org/10.1038/ki.2010.232.
D. Gohel and P. Skintzos. Flextable: Functions for tabular reporting. 2023. URL https://CRAN.R-project.org/package=flextable. R package version 0.9.3.
L. Gonçalves, A. Subtil, M. Rosário Oliveira and P. De Zea Bermudez. ROC curve estimation: An overview. Statistical Journal, 12(1): 1–20, 2014. URL https://doi.org/10.57805/revstat.v12i1.141.
J. A. Hanley and B. McNeil. The meaning and use of the area under the receiver operating characteristic (ROC) curve. Radiology, 20(143): 29–36, 1982. URL https://doi.org/10.1148/radiology.143.1.7063747.
F. E. Harrel. Regression modeling strategies: With applications to linear models, logistic and ordonal regression and survival analysis. Springer International Publishing, 2015.
F. E. Harrell Jr. Rms: Regression modeling strategies. 2023. URL https://CRAN.R-project.org/package=rms. R package version 6.7-1.
P. J. Heagerty, T. Lumley and M. S. Pepe. Time-Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker. Biometrics., 56(2): 337–344, 2000. URL https://doi.org/10.1111/j.0006-341x.2000.00337.x.
P. J. Heagerty and P. Saha-Chaudhuri. survivalROC: Time-dependent ROC curve estimation from censored survival data. 2022. URL https://CRAN.R-project.org/package=survivalROC. R package version 1.0.3.1.
F. Hsieh and B. W. Turnbull. Nonparametric and semiparametric estimation of the receiver operating characteristic curve. The Annals of Statistics, 24(1): 25–40, 1996. URL https://doi.org/10.1214/aos/1033066197.
H. Hung and C. Chiang. Optimal composite markers for time-dependent receiver operating characteristic curves with censored survival data. Scandinavian Journal of Statistics, 20(37): 664–679, 2010. URL https://doi.org/10.1111/j.1467-9469.2009.00683.x.
C. M. Hurvich, J. S. Simonoff and C.-L. Tsai. Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 60(2): 271–293, 1998. URL https://doi.org/10.1111/1467-9868.00125.
A. N. Kamarudin, T. Cox and R. Kolamunnage-Dona. Time-dependent ROC curve analysis in medical research: Current methods and applications. BMC Medical Research Methodology, 17(53): 2017. URL https://doi.org/10.1186/s12874-017-0332-6.
L. Li, T. Greene and B. Hu. A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data. Statistical Methods in Medical Research, 27(8): 2264–2278, 2018. URL https://doi.org/10.1177/0962280216680239.
L. Li and C. Wu. tdROC: Non-parametric estimation of time-dependent ROC curve for right censored survival data. 2016. URL https://CRAN.R-project.org/package=tdROC.
J. Lin, Y. Wu, X. Wang and K. Owzar. intcensROC: AUC estimation of interval censored survival data. 2021. URL https://CRAN.R-project.org/package=intcensROC. R package version 0.1.3.
J. Long, Z. Yang, L. Wang, Y. Han, C. Peng, C. Yan and D. Yan. Metabolite biomarkers of Type 2 diabetes mellitus and pre-diabetes: A systematic review and meta-analysis. BMC Endocrine Disorders, 20(1): SP174, 2020. URL https://doi.org/10.1186/s12902-020-00653-x.
M. López-Ratón, M. X. Rodríguez-Álvarez, C. Cadarso-Suárez and F. Gude-Sampedro. OptimalCutpoints: An R package for selecting optimal cutpoints in diagnostic tests. Journal of Statistical Software, 61(8): 1–36, 2014. URL https://doi.org/10.18637/jss.v061.i08.
P. Martínez-Camblor. Nonparametric cutoff point estimation for diagnostic decisions with weighted errors. Revista Colombiana de Estadística, 34(1): 133–146, 2011. URL https://doi.org/10.15446/rce.
P. Martínez-Camblor, G. F. Bayón and S. Pérez-Fernández. Cumulative/dynamic ROC curve estimation. Journal of Statistical Computation and Simulation, 86(17): 3582–3594, 2016. URL https://doi.org/10.1080/00949655.2016.1175442.
P. Martínez-Camblor and J. C. Pardo-Fernández. Smooth time-dependent receiver operating characteristic curve estimators. Statistical Methods in Medical Research, 27(3): 651–674, 2018. URL https://doi.org/10.1177/0962280217740786.
Microsoft and S. Weston. Foreach: Provides foreach looping construct. 2022. URL https://CRAN.R-project.org/package=foreach. R package version 1.5.2.
L. Ni and X. H. Wehrens. Cardiac troponin I - more than a biomarker for myocardial ischemia? Annals of Translational Medicine, Suppl 1(6): S17, 2018. URL https://doi.org/10.21037/atm.2018.09.07.
M. S. Pepe. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Sciences Series, 2003.
S. Pérez-Fernández. nsROC: Non-standard ROC curve analysis. 2017. URL https://CRAN.R-project.org/package=nsROC.
S. Pérez-Fernández, P. Martínez-Camblor, P. Filzmoser and N. Corral. NsROC: An R package for non-standard ROC curve analysis. The R Journal, 10(2): 55–77, 2018. URL https://doi.org/10.32614/RJ-2018-043.
S. Potapov, W. Adler and M. Schmid. survAUC: Estimators of prediction accuracy for time-to-event data. 2023. URL https://CRAN.R-project.org/package=survAUC. R package version 1.2-0.
X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J. C. Sánchez and M. Müller. PROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(3): 77, 2011. URL https://doi.org/10.1186/1471-2105-12-77.
M. X. Rodríguez-Álvarez and V. Inácio. ROCnReg: An R package for receiver operating characteristic curve inference with and without covariates. The R Journal, 13: 525, 2021. URL https://doi.org/10.32614/RJ-2021-066.
M. C. Sachs. PlotROC: A tool for plotting ROC curves. Journal of Statistical Software, 79(2): 1–19, 2017. URL https://doi.org/10.18637/jss.v079.c02.
T. Sing, O. Sander, N. Beerenwinkel and T. Lengauer. ROCR: Visualizing classifier performance in R. Bioinformatics, 21(20): 7881, 2005. URL https://doi.org/10.1093/bioinformatics/bti623.
X. Song and X. H. Zhou. A semiparametric approach for the covariate-specific ROC curve with survival outcome. Statistica Sinica, 18: 947–965, 2008. URL http://www.jstor.org/stable/24308524.
Terry M. Therneau and Patricia M. Grambsch. Modeling survival data: Extending the Cox model. New York: Springer, 2000.
H. Uno, T. Cai, L. Tian and L. J. Wei. Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistics Association, 478(102): 527–537, 2007. URL https://doi.org/10.1198/016214507000000149.
J. R. Vidal-Castiñeira, A. López-Vázquez, P. Díaz-Bulnes, S. Díaz-Coto, L. Márquez-Kisinousky, J. Martínez-Borra, C. A. Navascues, P. Sanz-Cameno, A. A. Juan de la Vega, M. Rodríguez, et al. Genetic contribution of endoplasmic reticulum aminopeptidase 1 polymorphisms to liver fibrosis progression in patients with HCV infection. Journal of Molecular Medicine, 98: 1245–1254, 2020. URL https://doi.org/10.1007/s00109-020-01948-1.
H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org.
J. P. Willems, J. T. Saunders, D. E. Hunt and J. B. Schorling. Prevalence of coronary heart disease risk factors among rural blacks: A community-based study. Southern Medical Journal, 90(8): 814–820, 1997. URL https://doi.org/10.1097/00007611-199708000-00008.
Y. Wu, X. Wang, J. Lin, J. Beilin and K. Owzar. Predictive accuracy of markers or risk scores for interval censored survival data. Statistics in Medicine, 39(18): 2437–2446, 2020. URL https://doi.org/10.1002/sim.8547.
X.-H. Zhou, N. A. Obuchowski and D. K. McClish. Statistical Methods in Diagnostic Medicine. Wiley Blackwell, New York, 2002.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Díaz-Coto, et al., "Unified ROC Curve Estimator for Diagnosis and Prognosis Studies: The sMSROC Package", The R Journal, 2024

BibTeX citation

@article{RJ-2023-087,
  author = {Díaz-Coto, Susana and Martínez-Camblor, Pablo and Corral-Blanco, Norberto},
  title = {Unified ROC Curve Estimator for Diagnosis and Prognosis Studies: The sMSROC Package},
  journal = {The R Journal},
  year = {2024},
  note = {https://doi.org/10.32614/RJ-2023-087},
  doi = {10.32614/RJ-2023-087},
  volume = {15},
  issue = {4},
  issn = {2073-4859},
  pages = {129-149}
}