dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests

The combination of diagnostic tests has become a crucial area of research, aiming to improve the accuracy and robustness of medical diagnostics. While existing tools focus primarily on linear combination methods, there is a lack of comprehensive tools that integrate diverse methodologies. In this study, we present dtComb, a comprehensive R package and web tool designed to address the limitations of existing diagnostic test combination platforms. One of the unique contributions of dtComb is offering a range of 142 methods to combine two diagnostic tests, including linear, non-linear, machine learning algorithms, and mathematical operators. Another significant contribution of dtComb is its inclusion of advanced tools for ROC analysis, diagnostic performance metrics, and visual outputs such as sensitivity-specificity curves. Furthermore, dtComb offers classification functions for new observations, making it an easy-to-use tool for clinicians and researchers. The web-based version is also available at https://biotools.erciyes.edu.tr/dtComb/ for non-R users, providing an intuitive interface for test combination and model training.

S. Ilayda Yerlitaş Taştan (Department of Biostatistics) , Serra Bersan Gengeç (Department of Biostatistics) , Necla Koçhan (Department of Mathematics) , Ertürk Zararsız (Department of Biostatistics) , Selçuk Korkmaz (Department of Biostatistics) , Zararsız (Department of Biostatistics)
2026-02-03

1 Introduction

A typical scenario often encountered in combining diagnostic tests is when the gold standard method combines two-category and two continuous diagnostic tests. In such cases, clinicians usually seek to compare these two diagnostic tests and improve the performance of these diagnostic test results by dividing the results into proportional results (Nyblom et al. 2006; Faria et al. 2016; Müller et al. 2019). However, this technique is straightforward and may not fully capture all potential interactions and relationships between the diagnostic tests. Linear combination methods have been developed to overcome such problems (Ertürk Zararsız 2023).
Linear methods combine two diagnostic tests into a single score/index by assigning weights to each test, optimizing their performance in diagnosing the condition of interest (Neumann et al. 2023). Such methods improve accuracy by leveraging the strengths of both tests (Bansal and Sullivan Pepe 2013; Aznar-Gimeno et al. 2022). For instance, Su and Liu (Su and Liu 1993) found that Fisher’s linear discriminant function generates a linear combination of markers with either proportional or disproportional covariance matrices, aiming to maximize sensitivity consistently across the entire selectivity spectrum under a multivariate normal distribution model. In contrast, another approach introduced by Pepe and Thomson (Pepe and Thompson 2000) relies on ranking scores, eliminating the need for linear distributional assumptions when combining diagnostic tests. Despite the theoretical advances, when existing tools were examined, it was seen that they contained a limited number of methods. For instance, Kramar et al. developed a computer program called mROC that includes only the Su and Liu method (Kramar et al. 2001). Pérez-Fernández et al. presented a movieROC R package that includes methods such as Su and Liu, min-max, and logistic regression methods (Pérez-Fernández et al. 2021). An R package called maxmzpAUC that includes similar methods was developed by Yu and Park (Yu and Park 2015).

On the other hand, non-linear approaches incorporating the non-linearity between the diagnostic tests have been developed and employed to integrate the diagnostic tests (Ghosh and Chinnaiyan 2005; Du et al. 2024). These approaches incorporate the non-linear structure of tests into the model, which might improve the accuracy and reliability of the diagnosis. In contrast to some existing packages, which permit the use of non-linear approaches such as splines1, lasso2 and ridge regression, there is currently no package that employs these methods directly for combination and offers diagnostic performance. Machine-learning (ML) algorithms have recently been adopted to combine diagnostic tests (Agarwal et al. 2023; Prinzi et al. 2023; Ahsan et al. 2024; Sewak et al. 2024). Many publications/studies focus on implementing ML algorithms in diagnostic tests (Zararsiz et al. 2016; Salvetat et al. 2022, 2024; Ganapathy et al. 2023; Alzyoud et al. 2024). For instance, DeGroat et al. performed four different classification algorithms (Random Forest, Support Vector Machine, Extreme Gradient Boosting Decision Trees, and k-Nearest Neighbors) to combine markers for the diagnosis of cardiovascular disease (DeGroat et al. 2024). The results showed that patients with cardiovascular disease can be diagnosed with up to 96% accuracy using these ML techniques. There are numerous applications where ML methods can be implemented (scikit-learn (Pedregosa et al. 2011), TensorFlow (Abadi et al. 2015), caret (Kuhn 2008)). The caret library is one of the most comprehensive tools developed in the R language(Kuhn 2008). However, these are general tools developed only for ML algorithms and do not directly combine two diagnostic tests or provide diagnostic performance measures.

Apart from the aforementioned methods, several basic mathematical operations such as addition, multiplication, subtraction, and division can also be used to combine markers (Luo et al. 2024; Serban et al. 2024; Svart et al. 2024). For instance, addition can enhance diagnostic sensitivity by combining the effects of markers, whereas subtraction can more distinctly differentiate disease states by illustrating the variance across markers. On the other hand, there are several commercial (e.g. IBM SPSS, MedCalc, Stata, etc.) and open source (R) software packages (ROCR (Sing et al. 2005), (pROC (Robin et al. 2011), PRROC (Grau et al. 2015), plotROC (Sachs 2017)) that researchers can use for Receiver operating characteristic (ROC) curve analysis. However, these tools are designed to perform a single marker ROC analysis. As a result, there is currently no software tool that covers almost all combination methods.

In this study, we developed dtComb, an R package encompassing nearly all existing combination approaches in the literature. dtComb has two key advantages, making it easy to apply and superior to the other packages: (1) it provides users with a comprehensive 142 methods, including linear and non-linear approaches, ML approaches and mathematical operators; (2) it produces turnkey solutions to users from the stage of uploading data to the stage of performing analyses, performance evaluation and reporting. Furthermore, it is the only package that illustrates linear approaches such as Minimax and Todor & Saplacan (Todor et al. 2014; Sameera et al. 2016). In addition, it allows for the classification of new, previously unseen observations using trained models. To our knowledge, no other tools were designed and developed to combine two diagnostic tests on a single platform with 142 different methods. In other words, dtComb has made more effective and robust combination methods ready for application instead of traditional approaches such as simple ratio-based methods. First, we review the theoretical basis of the related combination methods; then, we present an example implementation to demonstrate the applicability of the package. Finally, we present a user-friendly, up-to-date, and comprehensive web tool developed to facilitate dtComb for physicians and healthcare professionals who do not use the R programming language. The dtComb package is freely available on the CRAN network, the web application is freely available at https://biotools.erciyes.edu.tr/dtComb/, and all source code is available on GitHub3.

2 Material and methods

This section will provide an overview of the combination methods implemented in the literature. Before applying these methods, we will also discuss the standardization techniques available for the markers, the resampling methods during model training, and, ultimately, the metrics used to evaluate the model’s performance.

Combination approaches

Linear combination methods

The dtComb package comprises eight distinct linear combination methods, which will be elaborated in this section. Before investigating these methods, we briefly introduce some notations which will be used throughout this section.
Notations:
Let \(D_{i}, i = 1, 2, …, n_1\) be the marker values of the \(i\)th individual in the diseased group, where \(D_i=(D_{i1},D_{i2})\), and \(H_j, j=1,2,…,n_2\) be the marker values of the \(j\)th individual in the healthy group, where \(H_j=(H_{j1},H_{j2})\). Let \(x_{i1}=c(D_{{i1}},H_{j1})\) be the values of the first marker, and \(x_{i2}=c(D_{i2},H_{j2})\) be values of the second marker for the \(i\)th individual \(i=1,2,...,n\). Let \(D_{i,min}=\min(D_{i1},D_{i2})\), \(D_{i,max}=\max(D_{i1},D_{i2})\), \(H_{j,min}=\min(H_{j1},H_{j2})\), \(H_{j,max}=\max(H_{j1},H_{j2})\) and \(c_i\) be the resulting combination score of the \(i\)th individual.

Non-linear combination methods

In addition to linear combination methods, the dtComb package includes seven non-linear approaches, which will be discussed in this subsection. In this subsection, we will use the following notations: \(x_{ij}\): the value of the jth marker for the ith individual, \(i=1,2,...,n\) and \(j=1,2\) d: degree of polynomial regressions and splines, \(d = 1,2,…,p\).

Mathematical Operators

This section will mention four arithmetic operators, eight distance measurements, and the exponential approach. Also, unlike other approaches, in this section, users can apply logarithmic, exponential, and trigonometric (sinus and cosine) transformations on the markers. Let \(x_{ij}\) represent the value of the jth variable for the ith observation, with \(i=1,2,...,n\) and \(j=1,2\). Let the resulting combination score for the ith individual be \(c_i\).

Machine-Learning algorithms

Machine-learning algorithms have been increasingly implemented in various fields, including the medical field, to combine diagnostic tests. Integrating diagnostic tests through ML can lead to more accurate, timely, and personalized diagnoses, which are particularly valuable in complex medical cases where multiple factors must be considered. In this study, we aimed to incorporate almost all ML algorithms in the package we developed. We took advantage of the caret package in R (Kuhn 2008) to achieve this goal. This package includes 190 classification algorithms that could be used to train models and make predictions. Our study focused on models that use numerical inputs and produce binary responses depending on the variables/features and the desired outcome. This selection process resulted in 113 models we further implemented in our study. We then classified these 113 models into five classes using the same idea given in (Zararsiz et al. 2016): (i) discriminant classifiers, (ii) decision tree models, (iii) kernel-based classifiers, (iv) ensemble classifiers, and (v) others. Like in the caret package, mlComb() sets up a grid of tuning parameters for a number of classification routines, fits each model, and calculates a performance measure based on resampling. After the model fitting, it uses the predict() function to calculate the probability of the "event" occurring for each observation. Finally, it performs ROC analysis based on the probabilities obtained from the prediction step.

Standardization

Standardization is converting/transforming data into a standard scale to facilitate meaningful comparisons and statistical inference. Many statistical techniques frequently employ standardization to improve the interpretability and comparability of data. We implemented five different standardization methods that can be applied for each marker, the formulas of which are listed below:

Model building

After specifying a combination method from the dtComb package, users can build and optimize model parameters using functions like mlComb(), linComb(), nonlinComb(), and mathComb(), depending on the specific model selected. Parameter optimization is done using n-fold cross-validation, repeated n-fold cross-validation, and bootstrapping methods for linear and non-linear approaches (i.e., linComb(), nonlinComb()). Additionally, for machine-learning approaches (i.e., mlComb()), all of the resampling methods from the caret package are used to optimize the model parameters. The total number of parameters being optimized varies across models, and these parameters are fine-tuned to maximize the AUC. The returned object stores input data, preprocessed and transformed data, trained model, and resampling results.

Evaluation of model performances

A confusion matrix, as shown in Table 1, is a table used to evaluate the performance of a classification model and shows the number of correct and incorrect predictions. It compares predicted and actual

Table 1: Confusion Matrix
Predicted labels Actual class labels Total
Positive Negative
Positive TP FP TP+FP
Negative FN TN FN+TN
Total TP+FN FP+TN n

TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative, n: Sample size

class labels, with diagonal elements representing the correct predictions and off-diagonal elements representing the number of incorrect predictions. The dtComb package uses the OptimalCutpoints package (Yin and Tian 2014) to generate the confusion matrix and then epiR (Stevenson et al. 2017), including different performance metrics, to evaluate the performances. Various performance metrics, accuracy rate (ACC), Kappa statistic (\(\kappa\)), sensitivity (SE), specificity (SP), apparent and true prevalence (AP, TP), positive and negative predictive values (PPV, NPV), positive and negative likelihood ratio (PLR, NLR), the proportion of true outcome negative subjects that test positive (False T+ proportion for true D-), the proportion of true outcome positive subjects that test negative (False T- proportion for true D+), the proportion of test-positive subjects that are outcome negative (False T+ proportion for T+), the proportion of test negative subjects (False T- proportion for T-) that are outcome positive measures are available in the dtComb package. These metrics are summarized in Table 2.

Table 2: Performance Metrics and Formulas
Performance Metric Formula
Accuracy \(\text{ACC} = \frac{{\text{TP} + \text{TN}}}{2}\)
Kappa \(\kappa = \frac{{\text{ACC} - P_e}}{{1 - P_e}}\)
\(P_e = \frac{{(\text{TN} + \text{FN})(\text{TP} + \text{FP}) + (\text{FP} + \text{TN})(\text{FN} + \text{TN})}}{{n^2}}\)
Sensitivity (Recall) \(\text{SE} = \frac{{\text{TP}}}{{\text{TP} + \text{FN}}}\)
Specificity \(\text{SP} = \frac{{\text{TN}}}{{\text{TN} + \text{FP}}}\)
Apparent Prevalence \(\text{AP} = \frac{{\text{TP}}}{{n}} + \frac{{\text{FP}}}{{n}}\)
True Prevalence \(\text{TP} = \frac{{\text{AP} + \text{SP} - 1}}{{\text{SE} + \text{SP} - 1}}\)
Positive Predictive Value (Precision) \(\text{PPV} = \frac{{\text{TP}}}{{\text{TP} + \text{FP}}}\)
Negative Predictive Value \(\text{NPV} = \frac{{\text{TN}}}{{\text{TN} + \text{FN}}}\)
Positive Likelihood Ratio \(\text{PLR} = \frac{{\text{SE}}}{{1 - \text{SP}}}\)
Negative Likelihood Ratio \(\text{NLR} = \frac{{1 - \text{SE}}}{{\text{SP}}}\)
The Proportion of True Outcome Negative Subjects That Test Positive \(\frac{{\text{FP}}}{{\text{FP} + \text{TN}}}\)
The Proportion of True Outcome Positive Subjects That Test Negative \(\frac{{\text{FN}}}{{\text{TP} + \text{FN}}}\)
The Proportion of Test Positive Subjects That Are Outcome Negative \(\frac{{\text{FP}}}{{\text{TP} + \text{FN}}}\)
The Proportion of Test Negative Subjects That Are Outcome Positive \(\frac{{\text{FN}}}{{\text{FN} + \text{TN}}}\)

Prediction of the test cases

The class labels of the observations in the test set are predicted with the model parameters derived from the training phase. It is critical to emphasize that the same analytical procedures employed during the training phase have also been applied to the test set, such as normalization, transformation, or standardization. More specifically, if the training set underwent Z-standardization, the test set would similarly be standardized using the mean and standard deviation derived from the training set. The class labels of the test set are then estimated based on the cut-off value established during the training phase and using the model’s parameters that are trained using the training set.

Technical details and the structure of dtComb

The dtComb package is implemented using the R programming language (https://www.r-project.org/) version 4.2.0. Package development was facilitated with devtools (Wickham et al. 2022) and documented with roxygen2 (Wickham et al. 2024). Package testing was performed using 271 unit tests (Wickham 2024). Double programming was performed using Python (https://www.python.org/) to validate the implemented functions (Shiralkar 2010).
To combine diagnostic tests, the dtComb package allows the integration of eight linear combination methods, seven non-linear combination methods, arithmetic operators, and, in addition to these, eight distance metrics within the scope of mathematical operators and a total of 113 machine-learning algorithms from the caret package (Kuhn 2008). These are summarized in Table 3.

Table 3: Features of dtComb
Modules (Tab Panels) Features
Combination Methods
  • Linear Combination Approach (8 Different methods)

  • Non-linear Combination Approach (7 Different Methods)

  • Mathematical Operators (14 Different methods)

  • Machine-Learning Algorithms (113 Different Methods) (Kuhn 2008)

  • Five standardization methods applicable to linear, non-linear, mathematical methods

  • 16 preprocessing methods applicable to ML (Kuhn 2008)

  • Three different methods for linear and non-linear combination methods

    • Bootstrapping

    • Cross-validation

    • Repeated cross-validation

  • 12 different resampling methods for ML (Kuhn 2008)

3 Results

Table 4 summarizes the existing packages and programs, including dtComb, along with the number of combination methods included in each package. While mROC offers only one linear combination method, maxmzpAUC and movieROC provide five linear combination techniques each, and SLModels includes four. However, these existing packages primarily focus on linear combination approaches. In contrast, dtComb goes beyond these limitations by integrating not only linear methods but also non-linear approaches, machine learning algorithms, and mathematical operators.

Table 4: Comparison of dtComb vs. existing packages and programs
Packages&Programs Linear Comb. Non-linear Comb. Math. Operators ML algorithms
mROC (Kramar et al. 2001) 1 - - -
maxmzpAUC (Yu and Park 2015) 5 - - -
movieROC (Pérez-Fernández et al. 2021) 5 - - -
SLModels (Aznar-Gimeno et al. 2023) 4 - - -
dtComb 8 7 14 113

Dataset

To demonstrate the functionality of the dtComb package, we conduct a case study using four different combination methods. The data used in this study were obtained from patients who presented at Erciyes University Faculty of Medicine, Department of General Surgery, with complaints of abdominal pain (Akyildiz et al. 2010; Zararsiz et al. 2016). The dataset comprised D-dimer levels (D_dimer) and leukocyte counts (log_leukocyte) of 225 patients, divided into two groups (Group): the first group consisted of 110 patients who required an immediate laparotomy (needed). In comparison, the second group comprised 115 patients who did not (not_needed). After the evaluation of conventional treatment, the patients who underwent surgery due to their postoperative pathologies are placed in the first group. In contrast, those with a negative result from their laparotomy were assigned to the second group. All the analyses were performed by following a workflow given in Fig. 1. First of all, the dtComb package should be loaded in order to use related functions.

graphic without alt text
Figure 1: Combination steps of two diagnostic tests. The figure presents a schematic representation of the sequential steps involved in combining two diagnostic tests using a combination method.
# load dtComb package
library(dtComb)

Similarly, the laparotomy data can be loaded from the R database by using the following R code:


# load laparotomy data
data(laparotomy)

Implementation of the dtComb package

In order to demonstrate the applicability of the dtComb package, the implementation of an arbitrarily chosen method from each of the linear, non-linear, mathematical operator and machine learning approaches is demonstrated and their performance is compared. These methods are Pepe, Cai & Langton for linear combination, Splines for non-linear, Addition for mathematical operator and SVM for machine-learning. Before applying the methods, we split the data into two parts: a training set comprising 70% of the data and a test set comprising the remaining 30%.

# Splitting the data set into train and test (70%-30%)
set.seed(2128)
inTrain <- caret::createDataPartition(laparotomy$group, p = 0.7, list = FALSE)
trainData <- laparotomy[inTrain, ]
colnames(trainData) <- c("Group", "D_dimer", "log_leukocyte")
testData <- laparotomy[-inTrain, -1]

# define marker and status for combination function
markers <- trainData[, -1]
status <- factor(trainData$Group, levels = c("not_needed", "needed"))

The model is trained on trainData and the resampling parameters used in the training phase are chosen as ten repeat five fold repeated cross-validation. Direction = ‘<’ is chosen, as higher values indicate higher risks. The Youden index was chosen among the cut-off methods. We note that markers are not standardised and results are presented at the confidence level (CI 95%). Four main combination functions are run with the selected methods as follows.


# PCL method
fit.lin.PCL <- linComb(markers = markers,  status = status, event = "needed", 
                       method = "PCL", resample = "repeatedcv", nfolds = 5,
                       nrepeats = 10, direction = "<", cutoff.method = "Youden")

# splines method (degree = 3 and degrees of freedom = 3)
fit.nonlin.splines <- nonlinComb(markers = markers, status = status, event = "needed", 
                                 method = "splines", resample = "repeatedcv", nfolds = 5, 
                                 nrepeats = 10, cutoff.method = "Youden", direction = "<", 
                                 df1 = 3, df2 = 3)
#add operator
 fit.add <- mathComb(markers = markers, status = status, event = "needed",
                     method = "add", direction = "<", cutoff.method = "Youden")
#SVM
fit.svm <- mlComb(markers = markers, status = status, event = "needed", method = "svmLinear", 
                 resample = "repeatedcv", nfolds  = 5,nrepeats = 10, direction = "<", 
                 cutoff.method = "Youden")

Various measures were considered to compare model performances, including AUC, ACC, SEN, SPE, PPV, and NPV. AUC statistics, with 95% CI, have been calculated for each marker and method. The resulting statistics are as follows: 0.816 (0.751–0.880), 0.802 (0.728–0.877), 0.888 (0.825–0.930), 0.911 (0.868–0.954), 0.877 (0.824-0.929), and 0.875 (0.821-0.930) for D-dimer, Log(leukocyte), Pepe, Cai & Langton, Splines, Addition, and Support Vector Machine (SVM). The results revealed that the predictive performances of markers and the combination of markers are significantly higher than random chance in determining the use of laparotomy (\(p<0.05\)). The highest sensitivity and NPV were observed with the Addition method, while the highest specificity and PPV were observed with the Splines method. According to the overall AUC and accuracies, the combined approach fitted with the Splines method performed better than the other methods (Fig. 2). Therefore, the Splines method will be used in the subsequent analysis of the findings.

graphic without alt text
Figure 2: Radar plots of trained models and performance measures of two markers. Radar plots summarize the diagnostic performances of two markers and various combination methods in the training dataset. These plots illustrate the performance metrics such as AUC, ACC, SEN, SPE, PPV, and NPV measurements. In these plots, the width of the polygon formed by connecting each point indicates the model’s performance in terms of AUC, ACC, SEN, SPE, PPV, and NPV metrics. It can be observed that the polygon associated with the Splines method occupies the most extensive area, which means that the Splines method performed better than the other methods.

For the AUC of markers and the spline model:

fit.nonlin.splines$AUC_table
                    AUC     SE.AUC LowerLimit UpperLimit         z      p.value
D_dimer       0.8156966 0.03303310  0.7509530  0.8804403  9.556979 1.212446e-21
log_leukocyte 0.8022286 0.03791768  0.7279113  0.8765459  7.970652 1.578391e-15
Combination   0.9111752 0.02189588  0.8682601  0.9540904 18.778659 1.128958e-78

Here:
SE: Standard Error.
The area under ROC curves for D-dimer levels and leukocyte counts on the logarithmic scale and combination score were 0.816, 0.802, and 0.911, respectively. The ROC curves generated with the combination score from the splines model, D-dimer levels, and leukocyte count markers are also given in Fig. 3, showing that the combination score has the highest AUC. It is observed that the splines method significantly improved between 9.5% and 10.9% in AUC statistics compared to D-dimer level and leukocyte counts, respectively.

graphic without alt text
Figure 3: ROC curves. ROC curves for combined diagnostic tests, with sensitivity displayed on the y-axis and 1-specificity displayed on the x-axis. As can be observed, the combination score produced the highest AUC value, indicating that the combined strategy performs the best overall.

To see the results of the binary comparison between the combination score and markers:

fit.nonlin.splines$MultComp_table

Marker1 (A)   Marker2 (B)   AUC (A)   AUC (B)      |A-B|  SE(|A-B|)         z      p-value
1 Combination       D_dimer 0.9079686 0.8156966 0.09227193 0.02223904 4.1490971 3.337893e-05
2 Combination log_leukocyte 0.9079686 0.8022286 0.10573994 0.03466544 3.0502981 2.286144e-03
3     D_dimer log_leukocyte 0.8156966 0.8022286 0.01346801 0.04847560 0.2778308 7.811423e-01

Controlling Type I error using Bonferroni correction, comparison of combination score with markers yielded significant results (\(p<0.05\)).
To demonstrate the diagnostic test results and performance measures for non-linear combination approach, the following code can be used:

fit.nonlin.splines$DiagStatCombined
          Outcome +    Outcome -      Total
Test +           66           13         79
Test -           11           68         79
Total            77           81        158

Point estimates and 95% CIs:
--------------------------------------------------------------
Apparent prevalence *                  0.50 (0.42, 0.58)
True prevalence *                      0.49 (0.41, 0.57)
Sensitivity *                          0.86 (0.76, 0.93)
Specificity *                          0.84 (0.74, 0.91)
Positive predictive value *            0.84 (0.74, 0.91)
Negative predictive value *            0.86 (0.76, 0.93)
Positive likelihood ratio              5.34 (3.22, 8.86)
Negative likelihood ratio              0.17 (0.10, 0.30)
False T+ proportion for true D- *      0.16 (0.09, 0.26)
False T- proportion for true D+ *      0.14 (0.07, 0.24)
False T+ proportion for T+ *           0.16 (0.09, 0.26)
False T- proportion for T- *           0.14 (0.07, 0.24)
Correctly classified proportion *      0.85 (0.78, 0.90)
--------------------------------------------------------------
* Exact CIs

Furthermore, if the diagnostic test results and performance measures of the combination score are compared with the results of the single markers, it can be observed that the TN value of the combination score is higher than that of the single markers, and the combination of markers has higher specificity and positive-negative predictive value than the log-transformed leukocyte counts and D-dimer level (Table 5). Conversely, D-dimer has a higher sensitivity than the others. Optimal cut-off values for both markers and the combined approach are also given in this table.

Table 5: Statistical diagnostic measures with 95% confidence intervals for each marker and the combination score
Diagnostic Measures (95% CI) D-dimer level (\(>1.6\)) Log(leukocyte count) (\(>4.16\)) Combination score (\(>0.448\))
TP 66 61 65
TN 53 60 69
FP 28 21 12
FN 11 16 12
Apparent prevalence 0.59 (0.51-0.67) 0.52 (0.44-0.60) 0.49 (0.41-0.57)
True prevalence 0.49 (0.41-0.57) 0.49 (0.41-0.57) 0.49 (0.41-0.57)
Sensitivity 0.86 (0.76-0.93) 0.79 (0.68-0.88) 0.84 (0.74-0.92)
Specificity 0.65 (0.54-0.76) 0.74 (0.63-0.83) 0.85 (0.76-0.92)
Positive predictive value 0.70 (0.60-0.79) 0.74 (0.64-0.83) 0.84 (0.74-0.92)
Negative predictive value 0.83 (0.71-0.91) 0.79 (0.68-0.87) 0.85 (0.76-0.92)
Positive likelihood ratio 2.48 (1.81-3.39) 3.06 (2.08-4.49) 5.70 (3.35-9.69)
Negative likelihood ratio 0.22 (0.12-0.39) 0.28 (0.18-0.44) 0.18 (0.11-0.31)
False T+ proportion for true D- 0.35 (0.24-0.46) 0.26 (0.17-0.37) 0.15 (0.08-0.24)
False T- proportion for true D+ 0.14 (0.07-0.24) 0.21 (0.12-0.32) 0.16 (0.08-0.26)
False T+ proportion for T+ 0.30 (0.21-0.40) 0.26 (0.17-0.36) 0.16 (0.08-0.26)
False T- proportion for T- 0.17 (0.09-0.29) 0.21 (0.13-0.32) 0.15 (0.08-0.24)
Accuracy 0.75 (0.68-0.82) 0.77 (0.69-0.83) 0.85 (0.78-0.90)

For a comprehensive analysis, the plotComb function in dtComb can be used to generate plots of the kernel density and individual-value of combination scores of each group and the specificity and sensitivity corresponding to different cut-off point values Fig. 4. This function requires the result of the nonlinComb function, which is an object of the “dtComb” class and status which is of factor type.

# draw distribution, dispersion, and specificity and sensitivity plots
plotComb(fit.nonlin.splines, status)
graphic without alt text
Figure 4: Kernel density, individual-value, and sens&spe plots of the combination score acquired with the training model. Kernel density of the combination score for two groups: needed and not needed (a). Individual-value graph with classes on the x-axis and combination score on the y-axis (b). Sensitivity and specificity graph of the combination score c. While colors show each class in Figures (a) and (b), in Figure (c), the colors represent the sensitivity and specificity of the combination score.

If the model trained with Splines is to be tested, the generically written predict function is used. This function requires the test set and the result of the nonlinComb function, which is an object of the “dtComb” class. As a result of prediction, the output for each observation consisted of the combination score and the predicted label determined by the cut-off value derived from the model.

# To predict the test set 
pred <- predict(fit.nonlin.splines, testData)
head(pred)

   comb.score labels
1   0.6133884 needed
7   0.9946474 needed
10  0.9972347 needed
11  0.9925040 needed
13  0.9257699 needed
14  0.9847090 needed

Above, it can be seen that the estimated combination scores for the first six observations in the test set were labelled as needed because they were higher than the cut-off value of 0.448.

Web interface for the dtComb package

The primary goal of developing the dtComb package is to combine numerous distinct combination methods and make them easily accessible to researchers. Furthermore, the package includes diagnostic statistics and visualization tools for diagnostic tests and the combination score generated by the chosen method. Nevertheless, it is worth noting that using R code may pose challenges for physicians and those unfamiliar with R programming. We have also developed a user-friendly web application for dtComb using Shiny (Chang et al. 2024) to address this. This web-based tool is publicly accessible and provides an interactive interface with all the functionalities found in the dtComb package.
To initiate the analysis, users must upload their data by following the instructions outlined in the "Data upload" tab of the web tool. For convenience, we have provided three example datasets on this page to assist researchers in practicing the tool’s functionality and to guide them in formatting their own data (as illustrated in Fig. 5a). We also note that ROC analysis for a single marker can be performed within the ‘ROC Analysis for Single Marker(s)’ tab in the data upload section of the web interface.

In the "Analysis" tab, one can find two crucial subpanels:

graphic without alt text
Figure 5: Web interface of the dtComb package. The figure illustrates the web interface of the dtComb package, which demonstrates the steps involved in combining two diagnostic tests. a) Data Upload: The user is able to upload the dataset and select relevant markers, a gold standard test, and an event factor for analysis.b) Combination Analysis: This panel allows the selection of the combination method, method-specific parameters, and resampling options to refine the analysis. c) Combination Analysis Output: Displays the results generated by the selected combination method, providing the user with key metrics and visualizations for interpretation. d) Predict: Displays the prediction results of the trained model when applied to the test set.

4 Summary and further research

In clinical practice, multiple diagnostic tests are possible for disease diagnosis (Yu and Park 2015). Combining these tests to enhance diagnostic accuracy is a widely accepted approach (Su and Liu 1993; Pepe and Thompson 2000; Pepe et al. 2006; Liu et al. 2011; Todor et al. 2014; Sameera et al. 2016). As far as we know, the tools in Table 4 have been designed to combine diagnostic tests but only contain at most five different combination methods. As a result, despite the existence of numerous advanced combination methods, there is currently no extensive tool available for integrating diagnostic tests.
In this study, we presented dtComb, a comprehensive R package designed to combine diagnostic tests using various methods, including linear, non-linear, mathematical operators, and machine learning algorithms. The package integrates 142 different methods for combining two diagnostic markers to improve the accuracy of diagnosis. The package also provides ROC curve analysis, various graphical approaches, diagnostic performance scores, and binary comparison results. In the given example, one can determine whether patients with abdominal pain require laparotomy by combining the D-dimer levels and white blood cell counts of those patients. Various methods, such as linear and non-linear combinations, were tested, and the results showed that the Splines method performed better than the others, particularly in terms of AUC and accuracy compared to single tests. This shows that diagnostic accuracy can be improved with combination methods.
Future work can focus on extending the capabilities of the dtComb package. While some studies focus on combining multiple markers, our study aimed to combine two markers using nearly all existing methods and develop a tool and package for clinical practice (Kang et al. 2016).

R Software

The R package dtComb is now available on the CRAN website https://cran.r-project.org/web/packages/dtComb/index.html.

Acknowledgment

We would like to thank the Proofreading & Editing Office of the Dean for Research at Erciyes University for the copyediting and proofreading service for this manuscript.

5 Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. TensorFlow: Large-scale machine learning on heterogeneous systems. 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
S. Agarwal, A. S. Yadav, V. Dinesh, K. S. S. Vatsav, K. S. S. Prakash and S. Jaiswal. By artificial intelligence algorithms and machine learning models to diagnosis cancer. Materials Today: Proceedings, 80: 2969–2975, 2023.
M. Ahsan, A. Khan, K. R. Khan, B. B. Sinha and A. Sharma. Advancements in medical diagnosis and treatment through machine learning: A review. Expert Systems, 41(3): e13499, 2024.
H. Y. Akyildiz, E. Sozuer, A. Akcan, C. Kuçuk, T. Artis, İ. Biri and N. Yılmaz. The value of D-dimer test in the diagnosis of patients with nontraumatic acute abdomen. Turkish Journal of Trauma and Emergency Surgery, 16(1): 22–26, 2010.
M. Alzyoud, R. Alazaidah, M. Aljaidi, G. Samara, M. Qasem, M. Khalid and N. Al-Shanableh. Diagnosing diabetes mellitus using machine learning techniques. International Journal of Data and Network Science, 8(1): 179–188, 2024.
T. W. Anderson and R. R. Bahadur. Classification into two multivariate normal distributions with different covariance matrices. The annals of mathematical statistics, 420–431, 1962.
R. Aznar-Gimeno, L. M. Esteban, R. del-Hoyo-Alonso, Á. Borque-Fernando and G. Sanz. A stepwise algorithm for linearly combining biomarkers under Youden index maximization. Mathematics, 10(8): 1221, 2022.
R. Aznar-Gimeno, L. M. Esteban, G. Sanz and R. del-Hoyo-Alonso. Comparing the min–max–median/IQR approach with the min–max approach, logistic regression and XGBoost, maximising the Youden index. Symmetry (Basel), 15: 2023. URL https://doi.org/10.3390/sym15030756.
A. Bansal and M. Sullivan Pepe. When does combining markers improve classification performance and what are implications for practice? Statistics in medicine, 32(11): 1877–1892, 2013.
S.-H. Cha. Comprehensive survey on distance/similarity measures between probability density functions. City, 1(2): 1, 2007.
W. Chang, J. Cheng, J. Allaire, C. Sievert, B. Schloerke, Y. Xie, J. Allen, J. McPherson, A. Dipert and B. Borges. Shiny: Web application framework for R. 2024. URL https://shiny.posit.co/. R package version 1.9.1.9000, https://github.com/rstudio/shiny.
D. R. Cox and E. J. Snell. Analysis of binary data. 2nd ed London: Chapman; Hall/CRC, 1989.
W. DeGroat, H. Abdelhalim, K. Patel, D. Mendhe, S. Zeeshan and Z. Ahmed. Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine. Scientific reports, 14(1): 1, 2024.
Z. Du, P. Du and A. Liu. Likelihood ratio combination of multiple biomarkers via smoothing spline estimated densities. Statistics in Medicine, 43(7): 1372–1383, 2024.
B. Efron. The efficiency of logistic regression compared to normal discriminant analysis. Journal of the American Statistical Association, 70(352): 892–898, 1975.
A. Elhakeem, R. A. Hughes, K. Tilling, D. L. Cousminer, S. A. Jackowski, T. J. Cole, A. S. Kwong, Z. Li, S. F. Grant, A. D. Baxter-Jones, et al. Using linear and natural cubic splines, SITAR, and latent trajectory models to characterise nonlinear longitudinal growth trajectories in cohort studies. BMC Medical Research Methodology, 22(1): 68, 2022.
G. Ertürk Zararsız. Linear combination of leukocyte count and D-Dimer levels in the diagnosis of patients with non-traumatic acute abdomen. Med. Rec., 5: 84–90, 2023. URL https://doi.org/10.37990/medr.1166531.
S. S. Faria, P. C. Fernandes Jr, M. J. B. Silva, V. C. Lima, W. Fontes, R. Freitas-Junior, A. K. Eterovic and P. Forget. The neutrophil-to-lymphocyte ratio: A narrative review. ecancermedicalscience, 10: 2016.
J. Friedman, T. Hastie and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1): 1, 2010.
S. Ganapathy, H. KT, B. Jindal, P. S. Naik and S. Nair N. Comparison of diagnostic accuracy of models combining the renal biomarkers in predicting renal scarring in pediatric population with vesicoureteral reflux (VUR). Irish Journal of Medical Science (1971-), 192(5): 2521–2526, 2023.
D. Ghosh and A. M. Chinnaiyan. Classification and selection of biomarkers in genomic data using LASSO. BioMed Research International, 2005(2): 147–154, 2005.
J. Grau, I. Grosse and J. Keilwagen. PRROC: Computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics, 31(15): 2595–2597, 2015.
T. Hastie. Gam: Generalized additive models. 2015. URL https://CRAN.R-project.org/package=gam. R package version 1.22-5.
G. James, D. Witten, T. Hastie, R. Tibshirani, et al. An introduction to statistical learning. Springer, 2013.
L. Kang, A. Liu and L. Tian. Linear combination methods to improve diagnostic/prognostic accuracy on future observations. Statistical methods in medical research, 25(4): 1359–1380, 2016.
A. Kramar, D. Faraggi, A. Fortuné and B. Reiser. mROC: A computer program for combining tumour markers in predicting disease states. Computer methods and programs in biomedicine, 66(2-3): 199–207, 2001.
M. Kuhn. Building predictive models in R using the caret package. Journal of statistical software, 28: 1–26, 2008.
C. León, S. Ruiz-Santana, P. Saavedra, B. Almirante, J. Nolla-Salas, F. Álvarez-Lerma, J. Garnacho-Montero, M. Á. León, E. S. Group, et al. A bedside scoring system (“Candida score”) for early antifungal treatment in nonneutropenic critically ill patients with Candida colonization. Critical care medicine, 34(3): 730–737, 2006.
C. Liu, A. Liu and S. Halabi. A min–max combination of biomarkers to improve diagnostic accuracy. Statistics in medicine, 30(16): 2005–2014, 2011.
J. Luo, F. Yu, H. Zhou, X. Wu, Q. Zhou, Q. Liu and S. Gan. AST/ALT ratio is an independent risk factor for diabetic retinopathy: A cross-sectional study. Medicine, 103(26): e38583, 2024.
G. Minaev, R. Piché and A. Visa. Distance measures for classification of numerical features. 2018. URL https://trepo.tuni.fi/handle/10024/124353.
E. G. Müller, T. H. Edwin, C. Stokke, S. S. Navelsaker, A. Babovic, N. Bogdanovic, A. B. Knapskog and M. E. Revheim. Amyloid-\(\beta\) PET—correlation with cerebrospinal fluid biomarkers and prediction of Alzheimers disease diagnosis in a memory clinic. PloS one, 14(8): e0221365, 2019.
M. Neumann, H. Kothare and V. Ramanarayanan. Combining multiple multimodal speech features into an interpretable index score for capturing disease progression in Amyotrophic Lateral Sclerosis. Interspeech, 2353: 2023.
H. Nyblom, E. Björnsson, M. Simrén, F. Aldenborg, S. Almer and R. Olsson. The AST/ALT ratio as an indicator of cirrhosis in patients with PBC. Liver International, 26(7): 840–845, 2006.
S. Pandit, S. Gupta, et al. A comparative study on distance measuring approaches for clustering. International journal of research in computer science, 2(1): 29–31, 2011.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12: 2825–2830, 2011.
M. S. Pepe. The statistical evaluation of medical tests for classification and prediction. Oxford university press, 2003.
M. S. Pepe, T. Cai and G. Longton. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics, 62(1): 221–229, 2006.
M. S. Pepe and M. L. Thompson. Combining diagnostic test results to increase accuracy. Biostatistics, 1(2): 123–140, 2000.
S. Pérez-Fernández, P. Martı́nez-Camblor, P. Filzmoser and N. Corral. Visualizing the decision rules behind the ROC curves: Understanding the classification process. AStA Advances in Statistical Analysis, 105(1): 135–161, 2021.
F. Prinzi, C. Militello, N. Scichilone, S. Gaglio and S. Vitabile. Explainable machine-learning models for COVID-19 prognosis prediction using clinical, laboratory and radiomic features. IEEE Access, 11: 121492–121510, 2023.
X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez and M. Müller. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics, 12: 1–8, 2011.
S. Ruiz-Velasco. Asymptotic efficiency of logistic regression relative to linear discriminant analysis. Biometrika, 78(2): 235–243, 1991.
M. C. Sachs. plotROC: A tool for plotting ROC curves. Journal of statistical software, 79: 2017.
N. Salvetat, F. J. Checa-Robles, A. Delacrétaz, C. Cayzac, B. Dubuc, D. Vetter, J. Dainat, J.-P. Lang, F. Gamma and D. Weissmann. AI algorithm combined with RNA editing-based blood biomarkers to discriminate bipolar from major depressive disorders in an external validation multicentric cohort. Journal of Affective Disorders, 356: 385–393, 2024.
N. Salvetat, F. J. Checa-Robles, V. Patel, C. Cayzac, B. Dubuc, F. Chimienti, J.-D. Abraham, P. Dupré, D. Vetter, S. Méreuze, et al. A game changer for bipolar disorder diagnosis using RNA editing-based biomarkers. Translational Psychiatry, 12(1): 182, 2022.
G. Sameera, R. V. Vardhan and K. Sarma. Binary classification using multivariate receiver operating characteristic curve for continuous data. Journal of biopharmaceutical statistics, 26(3): 421–431, 2016.
D. Serban, N. Papanas, A. M. Dascalu, P. Kempler, I. Raz, A. A. Rizvi, M. Rizzo, C. Tudor, M. Silviu Tudosie, D. Tanasescu, et al. Significance of neutrophil to lymphocyte ratio (NLR) and platelet lymphocyte ratio (PLR) in diabetic foot ulcer and potential new therapeutic targets. The International Journal of Lower Extremity Wounds, 23(2): 205–216, 2024.
A. Sewak, S. Siegfried and T. Hothorn. Construction and evaluation of optimal diagnostic tests with application to hepatocellular carcinoma diagnosis. arXiv preprint arXiv:2402.03004, 2024.
P. Shiralkar. Programming validation: Perspectives and strategies. PharmaSUG 2010—paper IB09, 2010.
T. Sing, O. Sander, N. Beerenwinkel and T. Lengauer. ROCR: Visualizing classifier performance in R. Bioinformatics, 21(20): 3940–3941, 2005.
M. Stevenson, T. Nunes, C. Heuer, J. Marshall, J. Sanchez, R. Thornton, J. Reiczigel, J. Robison-Cox, P. Sebastiani, P. Solymos, et al. epiR: Tools for the analysis of epidemiological data. 2017. URL https://cran.r-project.org/web/packages/epiR/index.html. R package version 2.0.76.
J. Q. Su and J. S. Liu. Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association, 88(424): 1350–1355, 1993.
K. Svart, J. J. Korsbæk, R. H. Jensen, T. Parkner, C. S. Knudsen, S. G. Hasselbalch, S. M. Hagen, E. A. Wibroe, L. D. Molander and D. Beier. Neurofilament light chain is elevated in patients with newly diagnosed idiopathic intracranial hypertension: A prospective study. Cephalalgia, 44(5): 03331024241248203, 2024.
N. Todor, I. Todor and G. Săplăcan. Tools to identify linear combination of prognostic factors which maximizes area under receiver operator curve. Journal of clinical bioinformatics, 4: 1–7, 2014.
H. Wickham. Testthat: Get started with testing. 2024. URL https://cran.r-project.org/web/packages/testthat/index.html. R package version 3.2.1.1.
H. Wickham, P. Danenberg and M. Eugster. roxygen2: In-source documentation for R. 2024. URL https://cran.r-project.org/web/packages/roxygen2/index.html. R package version 7.3.2.
H. Wickham, J. Hester, W. Chang and J. Bryan. Devtools: Tools to make developing R packages easier. 2022. URL https://cran.r-project.org/web/packages/devtools/index.html. R package version 2.4.5.
J. Yin and L. Tian. Optimal linear combinations of multiple diagnostic biomarkers based on Youden index. Statistics in medicine, 33(8): 1426–1440, 2014.
W. Yu and T. Park. Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve. Computational Statistics & Data Analysis, 88: 15–27, 2015.
G. Zararsiz, H. Y. Akyildiz, D. GÖKSÜLÜK, S. Korkmaz and A. ÖZTÜRK. Statistical learning approaches in diagnosing patients with nontraumatic acute abdomen. Turkish Journal of Electrical Engineering and Computer Sciences, 24(5): 3685–3697, 2016.

  1. https://cran.r-project.org/web/ packages/splines/index.html↩︎

  2. https://cran.r-project.org/web/packages/glmnet/index.html↩︎

  3. https://github.com/gokmenzararsiz/dtComb, https://github.com/gokmenzararsiz/dtComb_Shiny

    ↩︎

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Taştan, et al., "dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests", The R Journal, 2026

BibTeX citation

@article{RJ-2025-036,
  author = {Taştan, S. Ilayda Yerlitaş and Gengeç, Serra Bersan and Koçhan, Necla and Zararsız, Ertürk and Korkmaz, Selçuk and Zararsız, },
  title = {dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests},
  journal = {The R Journal},
  year = {2026},
  note = {https://doi.org/10.32614/RJ-2025-036},
  doi = {10.32614/RJ-2025-036},
  volume = {17},
  issue = {4},
  issn = {2073-4859},
  pages = {80-102}
}