The Concordance Test, an Alternative to Kruskal-Wallis Based on the Kendall-tau Distance: An R Package

The Kendall rank correlation coefficient, based on the Kendall-\(\tau\) distance, is used to measure the ordinal association between two measurements. In this paper, we introduce a new coefficient also based on the Kendall-\(\tau\) distance, the Concordance coefficient, and a test to measure whether different samples come from the same distribution. This work also presents a new R package, ConcordanceTest, with the implementation of the proposed coefficient. We illustrate the use of the Concordance coefficient to measure the ordinal association between quantity and quality measures when two or more samples are considered. In this sense, the Concordance coefficient can be seen as a generalization of the Kendall rank correlation coefficient and an alternative to the non-parametric mean rank-based methods for comparing two or more samples. A comparison of the proposed Concordance coefficient and the classical Kruskal-Wallis statistic is presented through a comparison of the exact distributions of both statistics.

Javier Alcaraz (Center of Operations Research, Miguel Hernández University) , Laura Anton-Sanchez (Center of Operations Research, Miguel Hernández University) , Juan Francisco Monge (Center of Operations Research, Miguel Hernández University)
2022-10-11

1 Introduction

When we have a sample of observations of a given population it may be difficult to assume that they come from a certain distribution since we may not always have any type of information about the variable under study and when we do, it may not be enough to determine the type of distribution. In these cases, parametric inference is inappropriate. Moreover, this type of technique may be unsuitable should the observations not fulfill any of the basic assumptions on which they are based; normality and a large quantity of data.

Violation of the necessary assumptions in parametric statistics necessitates the use of non-parametric statistics. Non-parametric tests do not depend on the definition of a distribution function or statistical parameters such as mean, variance, etc. The use of non-parametric tests, despite being less powerful, is also adequate when there are not enough observations available, when data are non-normal data or when ordinal data are being analyzed.

Although the first steps in non-parametric statistics began earlier, it was not until the 1930s that a systematic study in this field appeared. (Fisher 1935) introduced the permutation test or randomization test as a simple way to compute the sampling distribution for any test statistic under the null hypothesis that does not establish any effect on all possible outcomes. Over the next two decades some of the main non-parametric tests emerged, (Pitman 1937; Kendall 1938; Kendall and Smith 1939; Friedman 1940; Wilcoxon 1945; Mann and Whitney 1947; Kruskal and Wallis 1952; Kruskal 1958), among others.

The main advantages of the non-parametric tests are: the data can be nonnumerical observations while they can be classified according to some criterion, they are usually easy to calculate and do not make any hypothesis about the distribution of the population from which the samples are taken. We can also cite two drawbacks: the non-parametric tests are less precise than other statistical models and they are based on the order of the elements in the sample and this order will likely stay the same even if the numerical data change.

There are many non-parametric tests in the literature, which can basically be classified into four categories depending on whether: it is a test to compare two or more than two related samples or a test for comparing related or unrelated samples. Examples of the most used non-parametric tests in the literature for each of these four situations are the following: the Wilcoxon signed-rank test (Wilcoxon 1945) for comparing two related samples, the Mann-Whitney (Wilcoxon) test (Mann and Whitney 1947) for comparing two unrelated samples, the Friedman test (Friedman 1940) for comparing three or more related samples, and the Kruskal-Wallis test (Kruskal and Wallis 1952) for comparing three or more unrelated samples. Several methods that exploit some characteristic of the samples have appeared in the literature in recent years, such as (Terpstra and Magel 2003; Alhakim and Hooper 2008).

It is also possible to measure the degree of association of two variables through a non-parametric approach, in that sense we can mention the Kendall rank correlation coefficient (Kendall 1938) and the Spearman rank correlation coefficient (Spearman 1904).

In (Aparicio et al. 2020), the authors introduce the Kendall-\(\tau\) partition ranking; given a ranking of elements of a set and given a disjoint partition of the same set, the Kendall-\(\tau\) partition ranking is the induced linear order of the subsets of the partition which follows from the given ranking of elements of a set. In this work, we propose to use the Kendall-\(\tau\) distance as a concordance measure between the different samples in an ordered set of observations. In this regard, the proposed measure, which we call Concordance coefficient, can be considered as an extension of the Kendall rank correlation coefficient when more than two samples are considered. The main difference between the proposed measure and the previous ones, is the consideration of the Kendall-\(\tau\) distance instead of ranks, which use classical methods. We also propose a significance test in order to determine when more than two samples come from the same distribution, and present a comparison with the classical Kruskal-Wallis method. We illustrate the use of the proposed coefficient with a new R package, ConcordanceTest (Alcaraz et al. 2022), which is freely available from the Comprehensive R Archive Network (CRAN). Actually, R establishes the state of the art in statistical software. There are currently packages for all the non-parametric tests mentioned above, for example: the Kendall package (McLeod 2011), which deals with the Kendall rank correlation coefficient; the pspearman package (Savicky 2014), with the Spearman rank correlation coefficient; or the stats package: an R Core Team and contributors’ worldwide package that contains many of the non-parametric tests for comparing two or more, related or unrelated, samples. The Kendall-\(\tau\) distance, on which the proposed coefficient is based, is one of the most used in distance-based models, for which there are also recent alternatives in R. See, for example, the PerMallows (Irurozki et al. 2016), rankdist (Qian and Yu 2019) or BayesMallows packages (Sorensen et al. 2020).

The remainder of this paper is organized as follows. After a brief review in the next section of the main features of the Kendall rank correlation coefficient and the Kruskal-Wallis statistic, in the following two sections we present the coefficient we propose in this work and illustrate its use with our ConcordanceTest package. Specifically, in the third section we introduce the Concordance coefficient while in the fourth section the related statistical test is presented. The fifth section includes a comparison between the Kruskal-Wallis test in the stats package and that presented in this work. Some final remarks follow in the last section. Appendix A presents an example of the probability distribution of the Concordance coefficient and the Kruskal-Wallis statistic. Appendix B deals with a comparison between the probability density function of the Concordance coefficient and the Kruskal-Wallis statistic for several experiments. Finally, Appendix C presents some details of how the p-values for the Concordance coefficient have been calculated and shows some critical values and exact p-values.

2 Non-parametric tests

This section presents the Kendall rank correlation coefficient (Kendall 1938), a coefficient to measure the relationship between two samples ordinally, and the Kruskal-Wallis statistical test (Kruskal and Wallis 1952), which is a rank-based statistical test to measure whether different samples come from the same distribution, without assuming a given distribution for the population.

Only these two non-parametric tests are presented in detail, since the test proposed in this paper uses the Kendall-\(\tau\) distance and it can be seen as an extension of the Kendall rank correlation coefficient when more than two samples are considered, and it is presented as an alternative to the Kruskal-Wallis statistical test.

The Kendall rank correlation coefficient is a non-parametric measure of correlation. This measure is based on the Kendall-\(\tau\) distance between two permutations of \(n\) elements. The Kendall-\(\tau\) distance (\(d_{K\text{-}\tau}\)) is defined as the number or pairwise disagreements between two permutations \(\pi_1\) and \(\pi_2\). For instance, if we have three elements, the distance from permutation 123 to permutations 132, 231 and 321 is 1, 2 and 3 respectively. The maximum number of disagreements that may occur between two permutations of \(n\) elements is \(n(n-1)/2\) and, in this case, all the values of permutation \(\pi_1\) are in the reverse order of \(\pi_2\).

The Kendall rank correlation coefficient between permutations \(\pi_1\) and \(\pi_2\), denoted by \(\tau\), is defined by \[\tau = 1 - 2 \frac{d_{K\text{-}\tau(\pi_1,\pi_2)}}{n(n-1)/2}.\] The Kendall rank correlation coefficient is used as a statistical test to determine whether there is a relationship or dependence between two random variables. The main advantages of this coefficient are: the data can be non-numerical observations if they can be ordered, it is easy to calculate, and the associated statistical test does not assume a known distribution of the population from which the samples are taken.

The Kruskal-Wallis test is a non-parametric statistical method to study whether different samples come from the same population. The test is the extension of the Mann-Whitney Test (Mann and Whitney 1947) when we have more than two samples or groups. The following example illustrates the Kruskal-Wallis test when comparing three samples.

Example 1. Let us assume that the effectiveness of three different treatments (\(A\), \(B\), \(C\)) has been measured for 6 individuals, two individuals being assigned to each of the treatments, with the effectiveness of each treatment being measured ordinally. We could obtain the result shown in Table 1, where, for example, the effectiveness of treatment \(A\) has been rated in first and third place.

Table 1: Result for an experiment with 6 people and 3 treatments.
\(A\) \(B\) \(A\) \(C\) \(C\) \(B\)
Rank 1 2 3 4 5 6

The Kruskal-Wallis statistic is determined by the difference between the ranks of the individuals in each category with the average rank. In our example, the average rank of the test is \(\overline{R}=3.5\), while the average rank of each of the three treatments are \(\overline{R}_A=2\), \(\overline{R}_B=4\) and \(\overline{R}_C=4.5\). The Kruskal-Wallis statistic, denoted by \(H\), is based on the calculation of the distance of each rank to the average rank, which can be expressed as follows: \[H = -3(n+1)+\frac{12}{n(n+1)}\sum_{i=1}^k \frac{R_i^2}{n_i},\] where \(n\) is the number of observations in the \(k\) samples, \(n_i\) is the number of observations in the \(i\)-th sample and \(R_i\) is the sum of the ranks in the \(i\)-th sample. In our example, the value of the Kruskal-Wallis statistic is: \[H = -3(n+1)+\frac{12}{n(n+1)}\sum_{i=1}^k \frac{R_i^2}{n_i} = -3(6+1)+\frac{12}{6(6+1)}\left (\frac{4^2}{2} + \frac{8^2}{2} + \frac{9^2}{2} \right ) = 2.\] Table 2 shows the probability distribution of the Kruskal-Wallis statistic for 3 treatments, each with 2 patients. Appendix A presents the Kruskal-Wallis statistic for all possible results in the experiment for 3 treatments with 2 people in each. In (Spurrier 2003), the author compares different methods for approximating the null probability points.

Table 2: Probability distribution for the Kruskal-Wallis statistic (\(H\)), with sample sizes \(N = (2,2,2)\).
\(H\) \(Prob\)
0.00 0.06667
0.29 0.13333
0.86 0.13333
1.14 0.13333
2.00 0.13333
2.57 0.06667
3.43 0.13333
3.71 0.13333
4.57 0.06667

3 The Concordance coefficient \(\tau_c\)

In (Aparicio et al. 2020), the authors introduce the Kendall-\(\tau\) partition ranking; given a ranking of elements of a set and given a disjoint partition of the same set, the Kendall-\(\tau\) partition ranking is the induced linear order of the subsets of the partition which follows from the given ranking of elements of a set. The Kendall-\(\tau\) partition ranking presents an ordinal alternative to the mean-based ranking that uses a pseudo-cardinal scale. Let \(\pi\) be permutation of the elements of set \(V\) and let \(V_1\), \(V_2\), \(\ldots\), \(V_k\) be a partition of \(V\) then, the Kendall-\(\tau\) distance from permutation \(\pi\) is given by \[d_{K\text{-}\tau} = \displaystyle \min \{ d_{K\text{-}\tau}(\rho, \pi):\text{ elements in }V_r\text{ are consecutively listed in }\rho,\,\, \forall r \}.\]

This distance is also called the disorder of permutation \(\pi\). For the calculation of the disorder of a permutation of elements, in (Aparicio et al. 2020), the authors establish that the distance or disorder of a permutation of elements \(\pi = (a|a|b|b|a|c|a|b|c|\cdots|c|a|b)\) is given by the solution of the Linear Ordering Problem (LOP) with the preference matrix \(M\), where the element \(m_{ab}\) of matrix \(M\) indicates the number of times that an element \(a\) of sample \(A\) precedes an element \(b\) of sample \(B\) in the order \(\pi\). The solution of the Linear Ordering Problem gives us a new order in the elements of \(\pi\), the closest to \(\pi\), in which all the elements belonging to the same sample are listed consecutively. The book publication by (Martí and Reinelt 2011) provides an exhaustive study of the Linear Ordering Problem.

The authors (Aparicio et al. 2020) present the properties of the Kendall-\(\tau\) partition ranking and compare it with classical mean and median-based rank approaches. Those properties are extracted from social choice theory and are adapted to a partition ranking, see (Arrow 1951; Kemeny 1959; Zahid and Swart 2015). Two of these properties are only true for the Kendall-\(\tau\) partition ranking: the Condorcet and Deletion Independence properties. The Condorcet property establishes that the most preferred subset must be listed before any other in any ranking; and the Deletion Independence property establishes that if any subset is removed, then the induced order of subsets does not change. In permutation \(\pi=(c|c|c|b|b|a|a|c|c)\) the set \(C\) is a condorcet winner, the most preferred set, but \(B\) has a lesser mean rank value than set \(C\) if set \(A\) is not considered in the comparison; therefore, the permutation \(\pi=(c|c|c|b|b|a|a|c|c)\) gives an example where ranking subsets from ranks is not very reliable.

From (Aparicio et al. 2020), the maximum number of disagreements that may occur in a permutation of \(n\) elements (where the elements are classified in \(k\) subsets \(V_1, V_2,\ldots, V_k\) of sizes \(n_1, n_2,\ldots, n_k\) respectively) is \(\sum_{r=1}^k\sum_{s =r+1}^k n_r \, n_s - (GP_{b} +\sum_{r=1}^k\sum_{s =r+1}^k \displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor )\), where \(GP_{b}\) is the Generalized Pentagonal Number of \(b\), and \(b\) the number of subsets \(V_k\) with odd cardinality. The Generalized Pentagonal number \(GP_{b}\), for \(b\in \mathbb{N}\), is \[GP_b=\left\{ \begin{array}{ll}\displaystyle\frac{\ell(3\ell -1)}{2} & b=2\ell \,\, (b \,\, \text{even}),\\ & \\\displaystyle\frac{\ell(3\ell +1)}{2} & b=2\ell +1 \,\, (b \,\, \text{odd}).\\ \end{array} \right.\] This maximum number of disagreements (the maximum disorder) in a permutation \(\pi\) of elements, allows us to define a relative disorder coefficient of permutation \(\pi\) as \[relative\, disorder(\pi) = \frac{\displaystyle d_{K\text{-}\tau}(\pi)}{\displaystyle\sum_{r=1}^k\sum_{s =r+1}^k n_r \, n_s - (GP_{b} +\sum_{r=1}^k\sum_{s =r+1}^k\displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor )}.\]

Definition 1. We define the Concordance coefficient (\(\tau_c\)) of permutation \(\pi\) as \[\tau_c= 1- relative\, disorder(\pi) = 1- \frac{\displaystyle d_{K\text{-}\tau}(\pi)}{\displaystyle\sum_{r=1}^k\sum_{s =r+1}^k n_r \, n_s - (GP_{b} +\sum_{r=1}^k\sum_{s =r+1}^k\displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor )} .\]

The Concordance coefficient (\(\tau_c\)) provides a measure of independence in the \(k\) samples, where \(\tau_c\) is a value between 0 and 1, taking the value of 1 when there is a total order between the samples, and 0 when the disorder is maximum. In this sense, the Concordance coefficient can be seen as a generalization of the Kendall rank correlation coefficient when we have more than two samples. Given that the Concordance coefficient satisfies the properties mentioned above, we consider it is more appropriate for measuring differences between samples than a rank-based method, such as Kruskal-Wallis’.

Example 1 (Cont.). Continuing with the data in Example 1, the results of the experiment provide the following order or permutation of the treatments \(\pi=(a|b|a|c|c|b|)\).

Given the order of individuals \(\pi=(a|b|a|c|c|b|)\), the ordering between individuals that leaves individuals with the same treatment together is ordination (\(a\) \(a\) \(b\) \(b\) \(c\) \(c\)) or (\(a\) \(a\) \(c\) \(c\) \(b\) \(b\)). Both ordinations only need 3 pairwise disagreements from the permutation \(\pi\). In order to find the permutation of elements (equal elements listed consecutively) closer to a given permutation, it is sufficient to solve the Linear Ordering Problem (LOP) with the preference matrix defined above. In this example, said matrix is:

\[\begin{matrix} & \begin{matrix} A & B & C \end{matrix}\\ \begin{matrix} A \\ B \\ C \end{matrix} & \begin{pmatrix} - & 3 & 4 \\ 1 & - & 2 \\ 0 & 2 & - \end{pmatrix} \end{matrix},\]

where each element of the matrix \(m_{ij}\) represents the number of times an individual of a treatment \(i\) precedes an individual of the treatment \(j\). The solution of the LOP is the permutation of treatments which maximizes the preferences of order in the experiment, that is, in this example, the permutations of treatments \((A\ B\ C)\) or \((A\ C\ B)\) retain 9 preferences expressed in the order of individuals represented by the permutation \(\pi\). Therefore, the distance of the permutation \(\pi\) to a total order between treatments is \(\sum_{i<j} n_i n_j -9 =3\). This distance, which is the number of pairwise disagreements needed in a permutation of elements to reach a permutation that establishes a total order between treatments, is denominated the disorder of a permutation by the authors of the work by [Aparicio et al. (2020)]1.

Then, the relative disorder of permutation \(\pi\) can be evaluated as \[relative \,disorder(\pi) = \frac{\displaystyle d_{K\text{-}\tau}(\pi)}{\displaystyle\sum_{r=1}^k\sum_{s =r+1}^k n_r \, n_s - (GP_{b} +\sum_{r=1}^k\sum_{s =r+1}^k\displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor )} =\frac{\displaystyle 3}{\displaystyle 12 - (0 +6 )} = \frac{3}{6}=\frac{1}{2},\] and the Concordance coefficient \[\tau_c= 1- relative\,disorder(\pi) = 1-\frac{1}{2}=\frac{1}{2}.\] Notice that no set of this example has odd cardinality, therefore the pentagonal number is \(GP_0=0\).

Table 3 shows the probability distribution of the disorder and the Concordance coefficient for 3 treatments with 2 patients each. Appendix A presents the disorder and the Concordance coefficient for all possible results in the experiment with sample sizes \(N = (2,2,2)\). Figure 1 compares the probability distribution of the Concordance coefficient and the Kruskal-Wallis statistic, for 3 treatments and 2 people in each treatment. Notice that some Kruskal-Wallis statistic values (\(H\)=2.57) are less probable than large ones.

Table 3: Probability distribution of the disorder (\(dis\)) and the Concordance coefficient (\(\tau_c\)), with sample sizes \(N = (2,2,2)\).
\(dis\) \(\tau_c\) \(Prob\)
6 0.0000 0.06667
5 0.1667 0.13333
4 0.3333 0.20000
3 0.5000 0.20000
2 0.6667 0.20000
1 0.8333 0.13333
0 1.0000 0.06667
graphic without alt text
Figure 1: Probability distribution of the Concordance coefficient (\(\tau_c\)=1-relative disorder) and the Kruskal-Wallis statistic (\(H\)), with sample sizes \(N = (2,2,2)\).

The Concordance coefficient in ConcordanceTest package

The R package we have developed allows to calculate both the Concordance coefficient and the Kruskal-Wallis statistic in order to facilitate their comparison. Given the high combinatorial degree of the problem of ordering samples of populations, some of the functions implemented in the package can perform the calculations exactly, exploring the entire sample space or possibilities, or they can approximate the sample space or possibilities by simulation.

The ConcordanceTest package can be installed from CRAN:

install.packages("ConcordanceTest")
library("ConcordanceTest")

and its functions can perform the calculations related only to the Concordance coefficient (default option, specified with the parameter H=0) or do them also for the Kruskal-Wallis statistic (H=1), allowing their comparison.

To obtain the probability distribution of the statistics, it is necessary to have the set of all possible permutations that can occur in the result of the experiment that we want to analyze (90=6!/2!2!2! in Example 1). This can be obtained through the function Permutations_With_Repetition(), which has been developed and included in the ConcordanceTest package.

The function CT_Distribution() calculates the probability distribution of the Concordance coefficient and the Kruskal-Wallis statistic. The set of possibilities (sample space) grows very quickly with the number of elements and with the number of sets and, in some cases, to calculate the probability distribution in an exact way becomes unaffordable, making it necessary to approximate calculations. Both an exact and an approximate calculation (default option) can be done using the function CT_Distribution(). It is used as follows:

CT_Distribution(Sample_Sizes, Num_Sim = 10000, H = 0, verbose = TRUE)

where Sample_Sizes is a numeric vector \((n_1,\ldots,n_k)\) containing the number of repetitions of each element, i.e., the size of each sample in the experiment. Num_Sim is the number of simulations to be performed in order to obtain the probability distribution of the statistics (10,000 by default). If Num_Sim is set to 0, the probability distribution tables are obtained exactly using the function Permutations_With_Repetition(). H is the parameter specifying whether the calculations must also be performed for the Kruskal-Wallis statistic, and verbose is a logical parameter that indicates whether some progress report of the simulations should be given.

Example 1 (Cont.). Using the function CT_Distribution() with Num_Sim equal to 0, we could obtain the probability distribution of the Kruskal-Wallis statistic and the Concordance coefficient in Example 1 (Tables 2 and 3, respectively) in an exact way. As shown in this example, we can also approximate the probability distributions of Example 1 by simulating, for example, 25,000 permutations of 3 treatments with 2 patients each. Note that, for reproducibility, we always initialize the generator for pseudo-random numbers when the results rely on simulation.

set.seed(12)
Sample_Sizes <- c(2,2,2)
CT_Distribution(Sample_Sizes, Num_Sim = 25000, H = 1)

$C_freq
     disorder  Concordance coefficient Frequency Probability
[1,]        6                     0.00         6      0.0667
[2,]        5                     0.17        12      0.1333
[3,]        4                     0.33        18      0.2000
[4,]        3                     0.50        18      0.2000
[5,]        2                     0.67        18      0.2000
[6,]        1                     0.83        12      0.1333
[7,]        0                     1.00         6      0.0667

$H_freq
      H Statistic Frequency Probability
 [1,]        0.00         6      0.0667
 [2,]        0.29        12      0.1333
 [3,]        0.86        12      0.1333
 [4,]        1.14        12      0.1333
 [5,]        2.00        12      0.1333
 [6,]        2.57         6      0.0667
 [7,]        3.43        12      0.1333
 [8,]        3.71        12      0.1333
 [9,]        4.57         6      0.0667

The function CT_Distribution() returns two elements. C_freq is a matrix with the probability distribution of the Concordance coefficient. Each row in the matrix contains the disorder, the value of the Concordance coefficient \(\tau_c\), the frequency and its probability. H_freq (only returned if H = 1) is a matrix with the probability distribution of the Kruskal-Wallis statistic. Each row in the matrix contains the value of the statistic \(H\), the frequency and its probability. The results obtained by the function CT_Distribution() are the same as those previously shown in Table 3 and Table 2 of Example 1.

4 Concordance test

In this section, we present the Concordance test in order to evaluate when different samples come from the same population distribution. The randomization test introduced by (Fisher 1935) establishes a framework for the statistical test based on permutations, see also (Box 1980; Stern 1990; Welch 1990).

If all the samples come from the same distribution, then all possible ways to rank \(n\) observations divided into \(k\) samples have the same probability of occurring. If a result of the experiment provides an order of the observations with a high disorder, it will support the idea that all observations come from the same population. On the contrary, a result with a small disorder will go against the claim that the observations come from the same population. In this way, we propose to consider samples that come from the same distribution as null hypothesis, while the alternative hypothesis is that some of the samples come from a different distribution.

\(H_0\):

There is no difference among the \(k\) populations.

\(H_a\):

At least one of the populations differs from the other populations.

The decision rule is to reject the null hypothesis if the disorder in the permutation of observations is small, equivalently if the Concordance coefficient \(\tau_c\) is close to one. We reject the null hypothesis \(H_0\) at the significance level \(\alpha\) if \(\tau_c\) is greater than the percentile \((1-\alpha)\%\) of the probability distribution of \(\tau_c\).

The following example illustrates the use of the Concordance test proposed in this work and compares it with the classical Kruskal-Wallis non-parametric test. The comparison will be made first considering that there are no ties and then modifying the data in the example so that ties appear.

Example 2.

Suppose we have applied three treatments to 18 patients, measuring the number of hours it takes these patients to recover. The results are shown in Table 4.

Table 4: Result for an experiment with 18 patients and 3 treatments.
Hours
Treatment A 12 13 15 20 23 28 30 32 40 48
Treatment B 29 31 49 52 54
Treatment C 24 26 44

Concordance test:

The experiment ranks the patients in the following ranking \[(a\ a\ a\ a\ a\ c\ c\ a\ b\ a\ b\ a\ a\ c\ a\ b\ b\ b).\]

If we perform the contrast using the disorder statistic or the Concordance coefficient \(\tau_c\), we must calculate the permutation of treatments that maximizes the order between patients obtained in the experiment. The matrix of preferences between treatments observed is as follows:

\[\begin{matrix} & \begin{matrix} A & B & C \end{matrix}\\ \begin{matrix} A \\ B \\ C \end{matrix} & \begin{pmatrix} - & 43 & 19 \\ 7 & - & 2 \\ 11 & 13 & - \end{pmatrix} \end{matrix}\] The order between treatments that maximizes the order between patients corresponds to the order \((A\ C\ B)\), satisfying 75 of the 95 preferences contained in the matrix, where the value 75 is the solution of the Linear Ordering Problem (LOP)2. Therefore, exactly 20 = 95-75 is the number of pairwise disagreements necessary to order the samples and obtain the order (ACB), that is, the disorder is 20. The greatest disorder that an order of elements can have with samples of 10, 5 and 3 elements is given by: \(\sum_{r=1}^3\sum_{s =r+1}^3 n_r \, n_s - (GP_{b} +\sum_{r=1}^3\sum_{s =r+1}^3 \displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor ) = 95 - (1 + 47) = 47\), therefore the Concordance coefficient is \(\tau_c = 1-20/47= 0.574\). The p-value of the disorder 20 or, equivalently, of the Concordance coefficient \(\tau_c=0.574\) is 0.0492723, therefore, at a level of significance less than 5% we can reject the null hypothesis of equality in treatments.

Kruskal-Wallis test:

The treatments A, B and C have average ranks of 7.3, 14.2 and 9, respectively, and the sum of ranks are \(R_A=73\), \(R_B=71\) and \(R_C=27\).

The Kruskal-Wallis statistic is given by: \[H = -3(n+1)+\frac{12}{n(n+1)}\sum \frac{R_i^2}{n_i}= -3(18+1)+\frac{12}{18(18+1)}\left( \frac{73^2}{10} +\frac{71^2}{5}+\frac{27^2}{3}\right)=5.6\]

In (Meyer and Seaman 2015), exact values for the Kruskal-Wallis contrast at different levels of significance are found. We can conclude by looking at the tables that the p-value of the \(H\) statistic is greater than 0.05, therefore, we cannot reject the null hypothesis that the treatments are equally effective.

Comparing both methods, the Concordance and Kruskal-Wallis tests provide similar results about the statistic but the conclusion differs.

Example 3.

Suppose we have the same experiment as in Example 2 but with ties. The results are shown in Table 5. Ties are in bold.

Table 5: Result for an experiment with 18 patients and 3 treatments. Example with ties.
Hours
Treatment A 12 13 15 20 24 29 30 32 40 49
Treatment B 29 31 49 52 54
Treatment C 24 26 44

Concordance test with ties:

The results of the experiment order the individuals according to the sequence: \[(a\ a\ a\ a\ (a\ c)\ c\ (a\ b)\ a\ b\ a\ a\ c\ (a\ b)\ b\ b)\] where the elements grouped in the order indicates that they tie. There are \(8\) different possibilities in order to undo ties in the ranking of elements. If the same probability is assumed for all of them, the expected preference matrix between treatments is given distributing the preference in the comparison of repeated observations with the same weight, that is, assigning the value 0.5 to each of the treatments when two tied units are compared. The preference matrix for this example would be as follows:

\[\begin{matrix} & \begin{matrix} A & B & C \end{matrix} \\ \begin{matrix} A \\ B \\ C \end{matrix} & \begin{pmatrix} - & 42 & 18.5 \\ 8 & - & 2 \\ 11.5 & 13 & -\\ \end{pmatrix} \end{matrix}\] Note that the previous matrix represents the matrix of expected preferences if all permutations of items with ties in which they are undone are considered, with the same probability of tie between elements.

The order between treatments that maximizes the order between patients, corresponds to the order \((A\ C\ B)\), satisfying 73.5 of the 95 preferences contained in the matrix, where 73.5 is the solution of the Linear Ordering Problem. Therefore, 21.5 = 95-73.5 is the expected number of pairwise disagreements necessary to order the samples and obtain the order \((A\ C\ B)\), that is, the disorder is 21.5 or, equivalently, the Concordance coefficient is \(\tau = 1-21.5/47= 0.543\), a value with a significance greater than 0.05, \(p - value > 0.05\). In this case, the observed data do not show significant evidence in favor of a difference in the effectiveness of treatments.

Kruskal-Wallis test with ties:

The treatments A, B and C have average ranks of 7.45, 14 and 8.83, respectively, and the sum of ranks are \(R_A=74.5\), \(R_B=70\) and \(R_C=26.5\).

The Kruskal-Wallis statistic is given by: \[H = -3(n+1)+\frac{12}{n(n+1)}\sum \frac{R_i^2}{n_i}= -3(18+1)+\frac{12}{18(18+1)}\left( \frac{74.5^2}{10} +\frac{70^2}{5}+\frac{26.5^2}{3}\right)=5.074\]

If we make the adjustment in the statistic for ties, we get: \[\tilde{H} = \frac{H}{\displaystyle 1-\frac{ \sum_(t_i^3-t_i)}{N^3-N}} = \frac{5.074}{\displaystyle 1-\frac{(2^3-2)+(2^3-2)+(2^3-2)}{18^3-18}}=\displaystyle 5.0897\]

where \(t_i\) is the number of ties of each value.

In this case, the Kruskal-Wallis test provides the same conclusion as the Concordance test; uncertainty being greater when we have ties.

Concordance test in ConcordanceTest package

The ConcordanceTest R-package allows to perform the hypothesis test for testing whether samples originate from the same distribution with the function CT_Hypothesis_Test(), which carries out the calculations by simulation. It is used as follows:

CT_Hypothesis_Test(Sample_List, Num_Sim = 10000, H = 0, verbose = TRUE)

where Sample_List is a list of numeric data vectors with the elements of each sample, Num_Sim is the number of used simulations (10,000 by default), H specifies whether the Kruskal-Wallis test must also be done, and verbose is a logical parameter that indicates whether some progress report of the simulations should be given.

Example 2 (Cont.). We use the ConcordanceTest* package to perform the Concordance and Kruskal-Wallis tests of Example 2. We use 25,000 simulations.*

set.seed(12)
A <- c(12,13,15,20,23,28,30,32,40,48)
B <- c(29,31,49,52,54)
C <- c(24,26,44)
Sample_List <- list(A, B, C)
CT_Hypothesis_Test(Sample_List, Num_Sim = 25000, H = 1)

$results
                        Statistic p-value
Concordance coefficient     0.574 0.04928
Kruskal Wallis              5.600 0.05292

$C_p_value
[1] 0.04928

$H_p_value
[1] 0.05292

The function CT_Hypothesis_Test() provides the value of the statistics together with the p-value associated with each of them. The result of the Kruskal-Wallis test is only returned if H = 1. Note that the approximated p-values obtained by simulation are close to the exact ones, 0.04927 and 0.05223 for the Concordance coefficient and the Kruskal-Wallis statistic, respectively.

An alternative to the contrast performed with the function CT_Hypothesis_Test() is to obtain the critical values of our contrast. This can be done with the ConcordanceTest package both in an exact or approximate way, using the function CT_Critical_Values(). It is used as follows:

CT_Critical_Values(Sample_Sizes, Num_Sim = 10000, H = 0, verbose = TRUE)

where Sample_Sizes is a numeric vector \((n_1,\ldots,n_k)\) containing the number of repetitions of each element, i.e., the size of each sample in the experiment. Num_Sim is the number of simulations carried out in order to obtain the probability distribution of the statistics (10,000 by default). If Num_Sim is set to 0, the critical values are obtained in an exact way. Otherwise they are obtained by simulation. H is the parameter specifying whether the critical values of the Kruskal-Wallis test must be calculated and returned, and verbose is a logical parameter that indicates whether some progress report of the simulations should be given.

The function returns a list with two elements. C_results are the critical values and p-values for a desired significance levels of 0.1, .05 and .01 of the Concordance coefficient, and H_results are the critical values and p-values of the Kruskal-Wallis statistic (only returned if H = 1).

Example 2 (Cont.). We show the results of the function CT_Critical_Values() with sample sizes \(N=(10,5,3)\) and 25,000 simulations. The results allow us to compare the test statistics with different significance levels.

set.seed(12)
Sample_Sizes <- c(10,5,3)
CT_Critical_Values(Sample_Sizes, Num_Sim = 25000, H = 1)

$C_results
              |  disorder |  Concordance coefficient |  p-value
Sig level .10          23                       0.51     0.0954
Sig level .05          20                       0.57     0.0492
Sig level .01          14                       0.70     0.0096

$H_results
              |  H Statistic |  p-value
Sig level .10           4.55     0.0995
Sig level .05           5.72     0.0497
Sig level .01           7.78     0.0097

To obtain the Concordance coefficient and the Kruskal-Wallis statistic from the result of an experiment, the ConcordanceTest package has the function CT_Coefficient(). This function is useful when we only want to obtain the value of the statistic to check its significance using statistical tables. The function CT_Coefficient() is used as follows:

CT_Coefficient(Sample_List, H = 0)

where Sample_List is a list of numeric data vectors with the elements of each sample, and H is defined as usual.

Example 2 (Cont.). We show the results of the function CT_Coefficient() for the data in Example 2.

A <- c(12,13,15,20,23,28,30,32,40,48)
B <- c(29,31,49,52,54)
C <- c(24,26,44)
Sample_List <- list(A, B, C)
CT_Coefficient(Sample_List, H = 1)

$Sample_Sizes
[1] 10  5  3

$order_elements
 [1] 1 1 1 1 1 3 3 1 2 1 2 1 1 3 1 2 2 2

$disorder
[1] 20

$Concordance_Coefficient
[1] 0.5744681

$H_Statistic
[1] 5.6

The function CT_Coefficient() returns a list with the following elements: Sample_Sizes is a numeric vector with the sample sizes, order_elements is a numeric vector containing the elements order, disorder is the disorder of the permutation given by order_elements, Concordance_Coefficient is the value of the Concordance coefficient \(\tau_c\), that is, 1 minus the relative disorder of the permutation given by order_elements, and H_Statistic is the Kruskal-Wallis statistic (only returned if H = 1).

Note that we can also solve problems with ties (as in Example 3) with the ConcordanceTest package.

Other functions in the ConcordanceTest package

The graphical visualization of the probability distributions of the Concordance coefficient and the Kruskal-Wallis statistic can be done with the function CT_Probability_Plot(). It is used as follows:

CT_Probability_Plot(C_freq = NULL, H_freq = NULL)

Using the function CT_Density_Plot() of the ConcordanceTest package, we can make an approximate representation of the density functions of the statistics, assuming that the probability distributions represent a sample of a continuous variable. It is used as follows:

CT_Density_Plot(C_freq = NULL, H_freq = NULL)

In both functions, C_freq is the probability distribution of the Concordance coefficient and H_freq is the probability distribution of the Kruskal-Wallis statistic, obtained exactly or approximately with the function CT_Distribution(). The function CT_Probability_Plot() can represent both probability distributions or only one of them (if it only receives the parameter C_freq or H_freq). Equivalently, the function CT_Density_Plot() can represent both density distributions or only one of them. Appendix B presents the empirical density probability functions for several experiments, where sample sizes vary form \(N=(4,4)\) to \(N=(5,5,4,4,4,4,4)\).

Example 2 (Cont.). Graphical visualization of the probability distributions and the density distributions of Example 2 generated by simulation. The first row of Figure 2 compares the probability distribution of the Concordance coefficient and the Kruskal-Wallis statistic. The second row of Figure 2 shows the probability density function of the Concordance coefficient (continuous line) and the Kruskal-Wallis statistic (dashed line). Note that the \(H\) statistic has been normalized between 0 and 1.

set.seed(12)
Sample_Sizes <- c(10,5,3)
ProbDistr <- CT_Distribution(Sample_Sizes, Num_Sim = 25000, H = 1)
layout(matrix(c(1,3,2,3), ncol=2))
CT_Probability_Plot(C_freq = ProbDistr$C_freq, H_freq = ProbDistr$H_freq)
CT_Density_Plot(C_freq = ProbDistr$C_freq, H_freq = ProbDistr$H_freq)
graphic without alt text
Figure 2: Probability distributions (first row) and density distributions (second row) of the Concordance coefficient (\(\tau_c\)=1-relative disorder) and the Kruskal-Wallis statistic (\(H\)), with sample sizes \(N=(10,5,3)\).

As we mentioned in Figure 1, Figure 2 also shows that similar values of the Kruskal-Wallis statistic present very different probabilities, and this leads to a less smooth function than that presented by the Concordance coefficient. We can also see that the Concordance coefficient presents a more symmetrical distribution. This performance is generalized and, therefore, we consider that the Concordance coefficient is more reliable than the Kruskal-Wallis statistic.

The ConcordanceTest package also contains the function LOP(), which solves the Linear Ordering Problem from a square data matrix. This function allows to calculate the disorder of a permutation of elements from the preference matrix induced by that permutation and, therefore, it is necessary for the calculation of the Concordance coefficient. The function LOP() is used by functions CT_Distribution(), CT_Hypothesis_Test() and CT_Coefficient(). It is used as follows:

LOP(mat_LOP)

where mat_LOP is the preference matrix defining the Linear Ordering Problem, a numeric square matrix for which we want to obtain the permutation of rows/columns that maximizes the sum of the elements above the main diagonal.

The function LOP() returns a list with the following elements: obj_val is the optimal value of the solution of the Linear Ordering Problem, that is, the sum of the elements above the main diagonal under the permutation rows/columns solution, permutation is the solution of the Linear Ordering Problem, that is, the rows/columns permutation, and permutation_matrix is the optimal permutation matrix of the Linear Ordering Problem.

Example 2 (Cont.). The matrix of preferences between treatments observed in Example 2 was:

\[\begin{matrix} &\begin{matrix} A & B & C \end{matrix} \\ \begin{matrix} A\\ B \\ C \end{matrix} & \begin{pmatrix} - & 43 & 19 \\ 7 & - & 2\\ 11 & 13 & - \end{pmatrix} \end{matrix}\] If we apply the function LOP() on this preference matrix we obtain the following results:

mat_LOP <- matrix(c(0,7,11,43,0,13,19,2,0), nrow=3)
LOP(mat_LOP)

$obj_val
[1] 75

$permutation
[1] 1 3 2

$permutation_matrix
     [,1] [,2] [,3]
[1,]    0    1    1
[2,]    0    0    0
[3,]    0    1    0

As we saw previously, the order between treatments that maximizes the order between patients corresponds to the order \((A\ C\ B)\) (permutation = 1 3 2), satisfying obj_val = 75 of the preferences contained in the matrix.

5 Comparison with kruskal.test() function from stats package

The well-known stats package contains, among many other functions, the function kruskal.test() that performs a Kruskal-Wallis rank sum test. In this section, we compare the results obtained with the ConcordanceTest package presented in this work and the function kruskal.test(), making use of the dataset from (Hollander and Wolfe 1973) referenced in the kruskal.test() examples.

Example 4. Comparison of kruskal.test() (stats package) and CT_Hypothesis_Test() functions with 25,000 simulations (ConcordanceTest package) using the dataset from (Hollander and Wolfe 1973).

## Hollander & Wolfe (1973), 116.
## Mucociliary efficiency from the rate of removal of dust in normal
##  subjects, subjects with obstructive airway disease, and subjects
##  with asbestosis.

x <- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects
y <- c(3.8, 2.7, 4.0, 2.4)      # with obstructive airway disease
z <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis
Sample_List <- list(x, y, z)

kruskal.test(Sample_List)

    Kruskal-Wallis rank sum test

data:  Sample_List
Kruskal-Wallis chi-squared = 0.77143, df = 2, p-value = 0.68

set.seed(12)
CT_Hypothesis_Test(Sample_List, Num_Sim = 25000, H = 1)

results
                        Statistic p-value
Concordance coefficient     0.188 0.78408
Kruskal-Wallis              0.771 0.71080

$C_p_value
[1] 0.78408

$H_p_value
[1] 0.7108

As can be observed, the value of the Kruskal-Wallis statistic is the same with both functions (0.771). However, the p-values associated with the statistic differ.

The Kruskal-Wallis statistic follows approximately a \(\chi^2\) distribution with degrees of freedom equal to the number of groups minus 1 (Kruskal and Wallis 1952). For this reason, the function kruskal.test() uses a \(\chi^2\) distribution to approximate the p-value (using the function pchisq()). In the case of the function CT_Hypothesis_Test(), it calculates the p-values using the simulations performed (25,000 in this example).

The function CT_Distribution() of the ConcordanceTest package allows the probability distribution tables of the Concordance coefficient and Kruskal-Wallis statistic to be computed, and they can be obtained exactly or by simulation. We can get the exact probability distribution tables and, consequently, the exact p-values in Example 4 with

CT_Distribution(c(5,4,5), Num_Sim = 0, H = 1)

In Example 4, the exact p-value for the Kruskal-Wallis statistic is 0.71077. Therefore, the difference between our p-value obtained with 25,000 simulations (0.71080) and the exact one is 0.00003, while the difference between the p-value approximated by the \(\chi^2\) distribution (0.68) and the exact one is 0.03077. Regarding the Concordance coefficient, the exact p-value is 0.78468, hence, the difference between our p-value obtained with 25,000 simulations (0.78408) and the exact one is 0.0006.

It is worth noting that the function kruskal.test() uses the \(\chi^2\) distribution to approximate the p-value regardless of the size of the samples, but (Kruskal and Wallis 1952) state that the Kruskal-Wallis statistic is distributed approximately as a \(\chi^2\), unless the samples are too small, in which case special approximations or exact tables should be provided. On the contrary, the ConcordanceTest package can always obtain a good approximation of the p-values, regardless of the size of the samples, as long as a high number of simulations is used.

6 Final remarks and future research

A new measure based on the Kendall-\(\tau\) distance is presented in this work to estimate the concordance of different samples. A statistical test to determine when different observations come from the same distribution is also introduced. A comparison with the classical Kruskal-Wallis test is introduced to show that both tests differ. As we have shown, the proposed coefficient is more appropriate and reliable than rank-based methods. This work also describes the R package ConcordanceTest (Alcaraz et al. 2022), which contains all the functions needed to work with the proposed Concordance coefficient and allows its comparison with the Kruskal-Wallis statistic.

This work aims to be an introduction of the new concordance measure between samples, but there still remains much to be done. There is a new problem and further challenges for researchers, for example: studying the asymptotic distribution of the Concordance coefficient, exploring the possibility of finding the exact distribution with the help of modern computing, or analyzing the power of the Concordance test presented in this work, among others.

7 Acknowledgments

The authors thank the grants PID2019-105952GB-I00 funded by Ministerio de Ciencia e Innovación/ Agencia Estatal de Investigación /10.13039/501100011033, Spain, and PROMETEO/2021/063 funded by the government of the Valencian Community, Spain.

8 Appendix A: Results in the experiment with sample sizes \(N=(2,2,2)\)

Table 6 shows the Concordance coefficient (\(\tau_c\)) and Kruskal-Wallis statistic (\(H\)) for all possible results in an experiment with three treatments and two people in each treatment.

Table 6: Concordance coefficient (\(\tau_c\)) and Kruskal-Wallis statistic (\(H\)) for all possible results in an experiment with sample sizes \(N=(2,2,2).\)
\(dis\) \(\tau_c\) \(H\) \(dis\) \(\tau_c\) \(H\) \(dis\) \(\tau_c\) \(H\)
a a b b c c 0 1.0000 4.57 b a a b c c 2 0.6667 3.43 c a a b b c 4 0.3333 1.14
a a b c b c 1 0.8333 3.71 b a a c b c 3 0.5000 2.00 c a a b c b 3 0.5000 2.00
a a b c c b 2 0.6667 3.43 b a a c c b 4 0.3333 1.14 c a a c b b 2 0.6667 3.43
a a c b b c 2 0.6667 3.43 b a b a c c 1 0.8333 3.71 c a b a b c 5 0.1667 0.29
a a c b c b 1 0.8333 3.71 b a b c a c 2 0.6667 2.57 c a b a c b 4 0.3333 0.86
a a c c b b 0 1.0000 4.57 b a b c c a 3 0.5000 2.00 c a b b a c 6 0.0000 0.00
a b a b c c 1 0.8333 3.71 b a c a b c 4 0.3333 0.86 c a b b c a 5 0.1667 0.29
a b a c b c 2 0.6667 2.57 b a c a c b 5 0.1667 0.29 c a b c a b 3 0.5000 1.14
a b a c c b 3 0.5000 2.00 b a c b a c 3 0.5000 1.14 c a b c b a 4 0.3333 0.86
a b b a c c 2 0.6667 3.43 b a c b c a 4 0.3333 0.86 c a c a b b 1 0.8333 3.71
a b b c a c 3 0.5000 2.00 b a c c a b 6 0.0000 0.00 c a c b a b 2 0.6667 2.57
a b b c c a 4 0.3333 1.14 b a c c b a 5 0.1667 0.29 c a c b b a 3 0.5000 2.00
a b c a b c 3 0.5000 1.14 b b a a c c 0 1.0000 4.57 c b a a b c 6 0.0000 0.00
a b c a c b 4 0.3333 0.86 b b a c a c 1 0.8333 3.71 c b a a c b 5 0.1667 0.29
a b c b a c 4 0.3333 0.86 b b a c c a 2 0.6667 3.43 c b a b a c 5 0.1667 0.29
a b c b c a 5 0.1667 0.29 b b c a a c 2 0.6667 3.43 c b a b c a 4 0.3333 0.86
a b c c a b 5 0.1667 0.29 b b c a c a 1 0.8333 3.71 c b a c a b 4 0.3333 0.86
a b c c b a 6 0.0000 0.00 b b c c a a 0 1.0000 4.57 c b a c b a 3 0.5000 1.14
a c a b b c 3 0.5000 2.00 b c a a b c 5 0.1667 0.29 c b b a a c 4 0.3333 1.14
a c a b c b 2 0.6667 2.57 b c a a c b 6 0.0000 0.00 c b b a c a 3 0.5000 2.00
a c a c b b 1 0.8333 3.71 b c a b a c 4 0.3333 0.86 c b b c a a 2 0.6667 3.43
a c b a b c 4 0.3333 0.86 b c a b c a 3 0.5000 1.14 c b c a a b 3 0.5000 2.00
a c b a c b 3 0.5000 1.14 b c a c a b 5 0.1667 0.29 c b c a b a 2 0.6667 2.57
a c b b a c 5 0.1667 0.29 b c a c b a 4 0.3333 0.86 c b c b a a 1 0.8333 3.71
a c b b c a 6 0.0000 0.00 b c b a a c 3 0.5000 2.00 c c a a b b 0 1.0000 4.57
a c b c a b 4 0.3333 0.86 b c b a c a 2 0.6667 2.57 c c a b a b 1 0.8333 3.71
a c b c b a 5 0.1667 0.29 b c b c a a 1 0.8333 3.71 c c a b b a 2 0.6667 3.43
a c c a b b 2 0.6667 3.43 b c c a a b 4 0.3333 1.14 c c b a a b 2 0.6667 3.43
a c c b a b 3 0.5000 2.00 b c c a b a 3 0.5000 2.00 c c b a b a 1 0.8333 3.71
a c c b b a 4 0.3333 1.14 b c c b a a 2 0.6667 3.43 c c b b a a 0 1.0000 4.57

9 Appendix B: Comparison of distributions

Table 7 shows the probability density function of the Concordance coefficient (continuous lines) and the Kruskal-Wallis statistic (dashed lines) generated by simulation. Number of simulations 100,000. Note that the \(H\) statistic has been normalized between 0 and 1.

Table 7: Empirical density probability functions for several experiments (Concordance coefficient in continuous lines and Kruskal-Wallis statistic in dashed lines), where sample sizes vary form \(N=(4,4)\) to \(N=(5,5,4,4,4,4,4)\).
image image image
Sample_Sizes=(4,4) Sample_Sizes=(3,3,2) Sample_Sizes=(2,2,2,2)
image image image
Sample_Sizes=(5,4) Sample_Sizes=(3,3,3) Sample_Sizes=(3,2,2,2)