The Kendall rank correlation coefficient, based on the Kendall-\(\tau\) distance, is used to measure the ordinal association between two measurements. In this paper, we introduce a new coefficient also based on the Kendall-\(\tau\) distance, the Concordance coefficient, and a test to measure whether different samples come from the same distribution. This work also presents a new R package, ConcordanceTest, with the implementation of the proposed coefficient. We illustrate the use of the Concordance coefficient to measure the ordinal association between quantity and quality measures when two or more samples are considered. In this sense, the Concordance coefficient can be seen as a generalization of the Kendall rank correlation coefficient and an alternative to the non-parametric mean rank-based methods for comparing two or more samples. A comparison of the proposed Concordance coefficient and the classical Kruskal-Wallis statistic is presented through a comparison of the exact distributions of both statistics.
When we have a sample of observations of a given population it may be difficult to assume that they come from a certain distribution since we may not always have any type of information about the variable under study and when we do, it may not be enough to determine the type of distribution. In these cases, parametric inference is inappropriate. Moreover, this type of technique may be unsuitable should the observations not fulfill any of the basic assumptions on which they are based; normality and a large quantity of data.
Violation of the necessary assumptions in parametric statistics necessitates the use of non-parametric statistics. Non-parametric tests do not depend on the definition of a distribution function or statistical parameters such as mean, variance, etc. The use of non-parametric tests, despite being less powerful, is also adequate when there are not enough observations available, when data are non-normal data or when ordinal data are being analyzed.
Although the first steps in non-parametric statistics began earlier, it was not until the 1930s that a systematic study in this field appeared. (Fisher 1935) introduced the permutation test or randomization test as a simple way to compute the sampling distribution for any test statistic under the null hypothesis that does not establish any effect on all possible outcomes. Over the next two decades some of the main non-parametric tests emerged, (Pitman 1937; Kendall 1938; Kendall and Smith 1939; Friedman 1940; Wilcoxon 1945; Mann and Whitney 1947; Kruskal and Wallis 1952; Kruskal 1958), among others.
The main advantages of the non-parametric tests are: the data can be nonnumerical observations while they can be classified according to some criterion, they are usually easy to calculate and do not make any hypothesis about the distribution of the population from which the samples are taken. We can also cite two drawbacks: the non-parametric tests are less precise than other statistical models and they are based on the order of the elements in the sample and this order will likely stay the same even if the numerical data change.
There are many non-parametric tests in the literature, which can basically be classified into four categories depending on whether: it is a test to compare two or more than two related samples or a test for comparing related or unrelated samples. Examples of the most used non-parametric tests in the literature for each of these four situations are the following: the Wilcoxon signed-rank test (Wilcoxon 1945) for comparing two related samples, the Mann-Whitney (Wilcoxon) test (Mann and Whitney 1947) for comparing two unrelated samples, the Friedman test (Friedman 1940) for comparing three or more related samples, and the Kruskal-Wallis test (Kruskal and Wallis 1952) for comparing three or more unrelated samples. Several methods that exploit some characteristic of the samples have appeared in the literature in recent years, such as (Terpstra and Magel 2003; Alhakim and Hooper 2008).
It is also possible to measure the degree of association of two variables through a non-parametric approach, in that sense we can mention the Kendall rank correlation coefficient (Kendall 1938) and the Spearman rank correlation coefficient (Spearman 1904).
In (Aparicio et al. 2020), the authors introduce the Kendall-\(\tau\) partition ranking; given a ranking of elements of a set and given a disjoint partition of the same set, the Kendall-\(\tau\) partition ranking is the induced linear order of the subsets of the partition which follows from the given ranking of elements of a set. In this work, we propose to use the Kendall-\(\tau\) distance as a concordance measure between the different samples in an ordered set of observations. In this regard, the proposed measure, which we call Concordance coefficient, can be considered as an extension of the Kendall rank correlation coefficient when more than two samples are considered. The main difference between the proposed measure and the previous ones, is the consideration of the Kendall-\(\tau\) distance instead of ranks, which use classical methods. We also propose a significance test in order to determine when more than two samples come from the same distribution, and present a comparison with the classical Kruskal-Wallis method. We illustrate the use of the proposed coefficient with a new R package, ConcordanceTest (Alcaraz et al. 2022), which is freely available from the Comprehensive R Archive Network (CRAN). Actually, R establishes the state of the art in statistical software. There are currently packages for all the non-parametric tests mentioned above, for example: the Kendall package (McLeod 2011), which deals with the Kendall rank correlation coefficient; the pspearman package (Savicky 2014), with the Spearman rank correlation coefficient; or the stats package: an R Core Team and contributors’ worldwide package that contains many of the non-parametric tests for comparing two or more, related or unrelated, samples. The Kendall-\(\tau\) distance, on which the proposed coefficient is based, is one of the most used in distance-based models, for which there are also recent alternatives in R. See, for example, the PerMallows (Irurozki et al. 2016), rankdist (Qian and Yu 2019) or BayesMallows packages (Sorensen et al. 2020).
The remainder of this paper is organized as follows. After a brief review in the next section of the main features of the Kendall rank correlation coefficient and the Kruskal-Wallis statistic, in the following two sections we present the coefficient we propose in this work and illustrate its use with our ConcordanceTest package. Specifically, in the third section we introduce the Concordance coefficient while in the fourth section the related statistical test is presented. The fifth section includes a comparison between the Kruskal-Wallis test in the stats package and that presented in this work. Some final remarks follow in the last section. Appendix A presents an example of the probability distribution of the Concordance coefficient and the Kruskal-Wallis statistic. Appendix B deals with a comparison between the probability density function of the Concordance coefficient and the Kruskal-Wallis statistic for several experiments. Finally, Appendix C presents some details of how the p-values for the Concordance coefficient have been calculated and shows some critical values and exact p-values.
This section presents the Kendall rank correlation coefficient (Kendall 1938), a coefficient to measure the relationship between two samples ordinally, and the Kruskal-Wallis statistical test (Kruskal and Wallis 1952), which is a rank-based statistical test to measure whether different samples come from the same distribution, without assuming a given distribution for the population.
Only these two non-parametric tests are presented in detail, since the test proposed in this paper uses the Kendall-\(\tau\) distance and it can be seen as an extension of the Kendall rank correlation coefficient when more than two samples are considered, and it is presented as an alternative to the Kruskal-Wallis statistical test.
The Kendall rank correlation coefficient is a non-parametric measure of correlation. This measure is based on the Kendall-\(\tau\) distance between two permutations of \(n\) elements. The Kendall-\(\tau\) distance (\(d_{K\text{-}\tau}\)) is defined as the number or pairwise disagreements between two permutations \(\pi_1\) and \(\pi_2\). For instance, if we have three elements, the distance from permutation 123 to permutations 132, 231 and 321 is 1, 2 and 3 respectively. The maximum number of disagreements that may occur between two permutations of \(n\) elements is \(n(n-1)/2\) and, in this case, all the values of permutation \(\pi_1\) are in the reverse order of \(\pi_2\).
The Kendall rank correlation coefficient between permutations \(\pi_1\) and \(\pi_2\), denoted by \(\tau\), is defined by \[\tau = 1 - 2 \frac{d_{K\text{-}\tau(\pi_1,\pi_2)}}{n(n-1)/2}.\] The Kendall rank correlation coefficient is used as a statistical test to determine whether there is a relationship or dependence between two random variables. The main advantages of this coefficient are: the data can be non-numerical observations if they can be ordered, it is easy to calculate, and the associated statistical test does not assume a known distribution of the population from which the samples are taken.
The Kruskal-Wallis test is a non-parametric statistical method to study whether different samples come from the same population. The test is the extension of the Mann-Whitney Test (Mann and Whitney 1947) when we have more than two samples or groups. The following example illustrates the Kruskal-Wallis test when comparing three samples.
Example 1. Let us assume that the effectiveness of three different treatments (\(A\), \(B\), \(C\)) has been measured for 6 individuals, two individuals being assigned to each of the treatments, with the effectiveness of each treatment being measured ordinally. We could obtain the result shown in Table 1, where, for example, the effectiveness of treatment \(A\) has been rated in first and third place.
\(A\) | \(B\) | \(A\) | \(C\) | \(C\) | \(B\) | |
---|---|---|---|---|---|---|
Rank | 1 | 2 | 3 | 4 | 5 | 6 |
The Kruskal-Wallis statistic is determined by the difference between the ranks of the individuals in each category with the average rank. In our example, the average rank of the test is \(\overline{R}=3.5\), while the average rank of each of the three treatments are \(\overline{R}_A=2\), \(\overline{R}_B=4\) and \(\overline{R}_C=4.5\). The Kruskal-Wallis statistic, denoted by \(H\), is based on the calculation of the distance of each rank to the average rank, which can be expressed as follows: \[H = -3(n+1)+\frac{12}{n(n+1)}\sum_{i=1}^k \frac{R_i^2}{n_i},\] where \(n\) is the number of observations in the \(k\) samples, \(n_i\) is the number of observations in the \(i\)-th sample and \(R_i\) is the sum of the ranks in the \(i\)-th sample. In our example, the value of the Kruskal-Wallis statistic is: \[H = -3(n+1)+\frac{12}{n(n+1)}\sum_{i=1}^k \frac{R_i^2}{n_i} = -3(6+1)+\frac{12}{6(6+1)}\left (\frac{4^2}{2} + \frac{8^2}{2} + \frac{9^2}{2} \right ) = 2.\] Table 2 shows the probability distribution of the Kruskal-Wallis statistic for 3 treatments, each with 2 patients. Appendix A presents the Kruskal-Wallis statistic for all possible results in the experiment for 3 treatments with 2 people in each. In (Spurrier 2003), the author compares different methods for approximating the null probability points.
\(H\) | \(Prob\) |
0.00 | 0.06667 |
0.29 | 0.13333 |
0.86 | 0.13333 |
1.14 | 0.13333 |
2.00 | 0.13333 |
2.57 | 0.06667 |
3.43 | 0.13333 |
3.71 | 0.13333 |
4.57 | 0.06667 |
In (Aparicio et al. 2020), the authors introduce the Kendall-\(\tau\) partition ranking; given a ranking of elements of a set and given a disjoint partition of the same set, the Kendall-\(\tau\) partition ranking is the induced linear order of the subsets of the partition which follows from the given ranking of elements of a set. The Kendall-\(\tau\) partition ranking presents an ordinal alternative to the mean-based ranking that uses a pseudo-cardinal scale. Let \(\pi\) be permutation of the elements of set \(V\) and let \(V_1\), \(V_2\), \(\ldots\), \(V_k\) be a partition of \(V\) then, the Kendall-\(\tau\) distance from permutation \(\pi\) is given by \[d_{K\text{-}\tau} = \displaystyle \min \{ d_{K\text{-}\tau}(\rho, \pi):\text{ elements in }V_r\text{ are consecutively listed in }\rho,\,\, \forall r \}.\]
This distance is also called the disorder of permutation \(\pi\). For the calculation of the disorder of a permutation of elements, in (Aparicio et al. 2020), the authors establish that the distance or disorder of a permutation of elements \(\pi = (a|a|b|b|a|c|a|b|c|\cdots|c|a|b)\) is given by the solution of the Linear Ordering Problem (LOP) with the preference matrix \(M\), where the element \(m_{ab}\) of matrix \(M\) indicates the number of times that an element \(a\) of sample \(A\) precedes an element \(b\) of sample \(B\) in the order \(\pi\). The solution of the Linear Ordering Problem gives us a new order in the elements of \(\pi\), the closest to \(\pi\), in which all the elements belonging to the same sample are listed consecutively. The book publication by (Martí and Reinelt 2011) provides an exhaustive study of the Linear Ordering Problem.
The authors (Aparicio et al. 2020) present the properties of the Kendall-\(\tau\) partition ranking and compare it with classical mean and median-based rank approaches. Those properties are extracted from social choice theory and are adapted to a partition ranking, see (Arrow 1951; Kemeny 1959; Zahid and Swart 2015). Two of these properties are only true for the Kendall-\(\tau\) partition ranking: the Condorcet and Deletion Independence properties. The Condorcet property establishes that the most preferred subset must be listed before any other in any ranking; and the Deletion Independence property establishes that if any subset is removed, then the induced order of subsets does not change. In permutation \(\pi=(c|c|c|b|b|a|a|c|c)\) the set \(C\) is a condorcet winner, the most preferred set, but \(B\) has a lesser mean rank value than set \(C\) if set \(A\) is not considered in the comparison; therefore, the permutation \(\pi=(c|c|c|b|b|a|a|c|c)\) gives an example where ranking subsets from ranks is not very reliable.
From (Aparicio et al. 2020), the maximum number of disagreements that may occur in a permutation of \(n\) elements (where the elements are classified in \(k\) subsets \(V_1, V_2,\ldots, V_k\) of sizes \(n_1, n_2,\ldots, n_k\) respectively) is \(\sum_{r=1}^k\sum_{s =r+1}^k n_r \, n_s - (GP_{b} +\sum_{r=1}^k\sum_{s =r+1}^k \displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor )\), where \(GP_{b}\) is the Generalized Pentagonal Number of \(b\), and \(b\) the number of subsets \(V_k\) with odd cardinality. The Generalized Pentagonal number \(GP_{b}\), for \(b\in \mathbb{N}\), is \[GP_b=\left\{ \begin{array}{ll}\displaystyle\frac{\ell(3\ell -1)}{2} & b=2\ell \,\, (b \,\, \text{even}),\\ & \\\displaystyle\frac{\ell(3\ell +1)}{2} & b=2\ell +1 \,\, (b \,\, \text{odd}).\\ \end{array} \right.\] This maximum number of disagreements (the maximum disorder) in a permutation \(\pi\) of elements, allows us to define a relative disorder coefficient of permutation \(\pi\) as \[relative\, disorder(\pi) = \frac{\displaystyle d_{K\text{-}\tau}(\pi)}{\displaystyle\sum_{r=1}^k\sum_{s =r+1}^k n_r \, n_s - (GP_{b} +\sum_{r=1}^k\sum_{s =r+1}^k\displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor )}.\]
Definition 1. We define the Concordance coefficient (\(\tau_c\)) of permutation \(\pi\) as \[\tau_c= 1- relative\, disorder(\pi) = 1- \frac{\displaystyle d_{K\text{-}\tau}(\pi)}{\displaystyle\sum_{r=1}^k\sum_{s =r+1}^k n_r \, n_s - (GP_{b} +\sum_{r=1}^k\sum_{s =r+1}^k\displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor )} .\]
The Concordance coefficient (\(\tau_c\)) provides a measure of independence in the \(k\) samples, where \(\tau_c\) is a value between 0 and 1, taking the value of 1 when there is a total order between the samples, and 0 when the disorder is maximum. In this sense, the Concordance coefficient can be seen as a generalization of the Kendall rank correlation coefficient when we have more than two samples. Given that the Concordance coefficient satisfies the properties mentioned above, we consider it is more appropriate for measuring differences between samples than a rank-based method, such as Kruskal-Wallis’.
Example 1 (Cont.). Continuing with the data in Example 1, the results of the experiment provide the following order or permutation of the treatments \(\pi=(a|b|a|c|c|b|)\).
Given the order of individuals \(\pi=(a|b|a|c|c|b|)\), the ordering between individuals that leaves individuals with the same treatment together is ordination (\(a\) \(a\) \(b\) \(b\) \(c\) \(c\)) or (\(a\) \(a\) \(c\) \(c\) \(b\) \(b\)). Both ordinations only need 3 pairwise disagreements from the permutation \(\pi\). In order to find the permutation of elements (equal elements listed consecutively) closer to a given permutation, it is sufficient to solve the Linear Ordering Problem (LOP) with the preference matrix defined above. In this example, said matrix is:
\[\begin{matrix} & \begin{matrix} A & B & C \end{matrix}\\ \begin{matrix} A \\ B \\ C \end{matrix} & \begin{pmatrix} - & 3 & 4 \\ 1 & - & 2 \\ 0 & 2 & - \end{pmatrix} \end{matrix},\]
where each element of the matrix \(m_{ij}\) represents the number of times an individual of a treatment \(i\) precedes an individual of the treatment \(j\). The solution of the LOP is the permutation of treatments which maximizes the preferences of order in the experiment, that is, in this example, the permutations of treatments \((A\ B\ C)\) or \((A\ C\ B)\) retain 9 preferences expressed in the order of individuals represented by the permutation \(\pi\). Therefore, the distance of the permutation \(\pi\) to a total order between treatments is \(\sum_{i<j} n_i n_j -9 =3\). This distance, which is the number of pairwise disagreements needed in a permutation of elements to reach a permutation that establishes a total order between treatments, is denominated the disorder of a permutation by the authors of the work by [Aparicio et al. (2020)]^{1}.
Then, the relative disorder of permutation \(\pi\) can be evaluated as \[relative \,disorder(\pi) = \frac{\displaystyle d_{K\text{-}\tau}(\pi)}{\displaystyle\sum_{r=1}^k\sum_{s =r+1}^k n_r \, n_s - (GP_{b} +\sum_{r=1}^k\sum_{s =r+1}^k\displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor )} =\frac{\displaystyle 3}{\displaystyle 12 - (0 +6 )} = \frac{3}{6}=\frac{1}{2},\] and the Concordance coefficient \[\tau_c= 1- relative\,disorder(\pi) = 1-\frac{1}{2}=\frac{1}{2}.\] Notice that no set of this example has odd cardinality, therefore the pentagonal number is \(GP_0=0\).
Table 3 shows the probability distribution of the disorder and the Concordance coefficient for 3 treatments with 2 patients each. Appendix A presents the disorder and the Concordance coefficient for all possible results in the experiment with sample sizes \(N = (2,2,2)\). Figure 1 compares the probability distribution of the Concordance coefficient and the Kruskal-Wallis statistic, for 3 treatments and 2 people in each treatment. Notice that some Kruskal-Wallis statistic values (\(H\)=2.57) are less probable than large ones.
\(dis\) | \(\tau_c\) | \(Prob\) |
6 | 0.0000 | 0.06667 |
5 | 0.1667 | 0.13333 |
4 | 0.3333 | 0.20000 |
3 | 0.5000 | 0.20000 |
2 | 0.6667 | 0.20000 |
1 | 0.8333 | 0.13333 |
0 | 1.0000 | 0.06667 |
The R package we have developed allows to calculate both the Concordance coefficient and the Kruskal-Wallis statistic in order to facilitate their comparison. Given the high combinatorial degree of the problem of ordering samples of populations, some of the functions implemented in the package can perform the calculations exactly, exploring the entire sample space or possibilities, or they can approximate the sample space or possibilities by simulation.
The ConcordanceTest package can be installed from CRAN:
install.packages("ConcordanceTest")
library("ConcordanceTest")
and its functions can perform the calculations related only to the
Concordance coefficient (default option, specified with the parameter
H
=0) or do them also for the Kruskal-Wallis statistic (H
=1),
allowing their comparison.
To obtain the probability distribution of the statistics, it is
necessary to have the set of all possible permutations that can occur in
the result of the experiment that we want to analyze (90=6!/2!2!2! in
Example 1). This can be obtained through the function
Permutations_With_Repetition()
, which has been developed and included
in the ConcordanceTest package.
The function CT_Distribution()
calculates the probability distribution
of the Concordance coefficient and the Kruskal-Wallis statistic. The set
of possibilities (sample space) grows very quickly with the number of
elements and with the number of sets and, in some cases, to calculate
the probability distribution in an exact way becomes unaffordable,
making it necessary to approximate calculations. Both an exact and an
approximate calculation (default option) can be done using the function
CT_Distribution()
. It is used as follows:
CT_Distribution(Sample_Sizes, Num_Sim = 10000, H = 0, verbose = TRUE)
where Sample_Sizes
is a numeric vector \((n_1,\ldots,n_k)\) containing
the number of repetitions of each element, i.e., the size of each sample
in the experiment. Num_Sim
is the number of simulations to be
performed in order to obtain the probability distribution of the
statistics (10,000 by default). If Num_Sim
is set to 0, the
probability distribution tables are obtained exactly using the function
Permutations_With_Repetition()
. H
is the parameter specifying
whether the calculations must also be performed for the Kruskal-Wallis
statistic, and verbose
is a logical parameter that indicates whether
some progress report of the simulations should be given.
Example 1 (Cont.). Using the function CT_Distribution()
with
Num_Sim
equal to 0, we could obtain the probability distribution of
the Kruskal-Wallis statistic and the Concordance coefficient in
Example 1 (Tables 2 and 3,
respectively) in an exact way. As shown in this example, we can also
approximate the probability distributions of Example 1 by
simulating, for example, 25,000 permutations of 3 treatments with 2
patients each. Note that, for reproducibility, we always initialize the
generator for pseudo-random numbers when the results rely on
simulation.
set.seed(12)
<- c(2,2,2)
Sample_Sizes CT_Distribution(Sample_Sizes, Num_Sim = 25000, H = 1)
$C_freq
disorder Concordance coefficient Frequency Probability1,] 6 0.00 6 0.0667
[2,] 5 0.17 12 0.1333
[3,] 4 0.33 18 0.2000
[4,] 3 0.50 18 0.2000
[5,] 2 0.67 18 0.2000
[6,] 1 0.83 12 0.1333
[7,] 0 1.00 6 0.0667
[
$H_freq
H Statistic Frequency Probability1,] 0.00 6 0.0667
[2,] 0.29 12 0.1333
[3,] 0.86 12 0.1333
[4,] 1.14 12 0.1333
[5,] 2.00 12 0.1333
[6,] 2.57 6 0.0667
[7,] 3.43 12 0.1333
[8,] 3.71 12 0.1333
[9,] 4.57 6 0.0667 [
The function CT_Distribution()
returns two elements. C_freq
is a
matrix with the probability distribution of the Concordance coefficient.
Each row in the matrix contains the disorder, the value of the
Concordance coefficient \(\tau_c\), the frequency and its probability.
H_freq
(only returned if H
= 1) is a matrix with the probability
distribution of the Kruskal-Wallis statistic. Each row in the matrix
contains the value of the statistic \(H\), the frequency and its
probability. The results obtained by the function CT_Distribution()
are the same as those previously shown in Table 3 and
Table 2 of Example 1.
In this section, we present the Concordance test in order to evaluate when different samples come from the same population distribution. The randomization test introduced by (Fisher 1935) establishes a framework for the statistical test based on permutations, see also (Box 1980; Stern 1990; Welch 1990).
If all the samples come from the same distribution, then all possible ways to rank \(n\) observations divided into \(k\) samples have the same probability of occurring. If a result of the experiment provides an order of the observations with a high disorder, it will support the idea that all observations come from the same population. On the contrary, a result with a small disorder will go against the claim that the observations come from the same population. In this way, we propose to consider samples that come from the same distribution as null hypothesis, while the alternative hypothesis is that some of the samples come from a different distribution.
There is no difference among the \(k\) populations.
At least one of the populations differs from the other populations.
The decision rule is to reject the null hypothesis if the disorder in the permutation of observations is small, equivalently if the Concordance coefficient \(\tau_c\) is close to one. We reject the null hypothesis \(H_0\) at the significance level \(\alpha\) if \(\tau_c\) is greater than the percentile \((1-\alpha)\%\) of the probability distribution of \(\tau_c\).
The following example illustrates the use of the Concordance test proposed in this work and compares it with the classical Kruskal-Wallis non-parametric test. The comparison will be made first considering that there are no ties and then modifying the data in the example so that ties appear.
Example 2.
Suppose we have applied three treatments to 18 patients, measuring the
number of hours it takes these patients to recover. The results are
shown in Table 4.
Hours | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Treatment A | 12 | 13 | 15 | 20 | 23 | 28 | 30 | 32 | 40 | 48 | ||
Treatment B | 29 | 31 | 49 | 52 | 54 | |||||||
Treatment C | 24 | 26 | 44 |
The experiment ranks the patients in the following ranking \[(a\ a\ a\ a\ a\ c\ c\ a\ b\ a\ b\ a\ a\ c\ a\ b\ b\ b).\]
If we perform the contrast using the disorder statistic or the Concordance coefficient \(\tau_c\), we must calculate the permutation of treatments that maximizes the order between patients obtained in the experiment. The matrix of preferences between treatments observed is as follows:
\[\begin{matrix} & \begin{matrix} A & B & C \end{matrix}\\ \begin{matrix} A \\ B \\ C \end{matrix} & \begin{pmatrix} - & 43 & 19 \\ 7 & - & 2 \\ 11 & 13 & - \end{pmatrix} \end{matrix}\] The order between treatments that maximizes the order between patients corresponds to the order \((A\ C\ B)\), satisfying 75 of the 95 preferences contained in the matrix, where the value 75 is the solution of the Linear Ordering Problem (LOP)^{2}. Therefore, exactly 20 = 95-75 is the number of pairwise disagreements necessary to order the samples and obtain the order (ACB), that is, the disorder is 20. The greatest disorder that an order of elements can have with samples of 10, 5 and 3 elements is given by: \(\sum_{r=1}^3\sum_{s =r+1}^3 n_r \, n_s - (GP_{b} +\sum_{r=1}^3\sum_{s =r+1}^3 \displaystyle\lfloor\frac{n_r n_s}{2}\displaystyle\rfloor ) = 95 - (1 + 47) = 47\), therefore the Concordance coefficient is \(\tau_c = 1-20/47= 0.574\). The p-value of the disorder 20 or, equivalently, of the Concordance coefficient \(\tau_c=0.574\) is 0.049272^{3}, therefore, at a level of significance less than 5% we can reject the null hypothesis of equality in treatments.
The treatments A, B and C have average ranks of 7.3, 14.2 and 9, respectively, and the sum of ranks are \(R_A=73\), \(R_B=71\) and \(R_C=27\).
The Kruskal-Wallis statistic is given by: \[H = -3(n+1)+\frac{12}{n(n+1)}\sum \frac{R_i^2}{n_i}= -3(18+1)+\frac{12}{18(18+1)}\left( \frac{73^2}{10} +\frac{71^2}{5}+\frac{27^2}{3}\right)=5.6\]
In (Meyer and Seaman 2015), exact values for the Kruskal-Wallis contrast at different levels of significance are found. We can conclude by looking at the tables that the p-value of the \(H\) statistic is greater than 0.05, therefore, we cannot reject the null hypothesis that the treatments are equally effective.
Comparing both methods, the Concordance and Kruskal-Wallis tests provide similar results about the statistic but the conclusion differs.
Example 3.
Suppose we have the same experiment as in Example 2
but with ties. The results are shown in Table 5. Ties are
in bold.
Hours | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Treatment A | 12 | 13 | 15 | 20 | 24 | 29 | 30 | 32 | 40 | 49 | ||
Treatment B | 29 | 31 | 49 | 52 | 54 | |||||||
Treatment C | 24 | 26 | 44 |
The results of the experiment order the individuals according to the sequence: \[(a\ a\ a\ a\ (a\ c)\ c\ (a\ b)\ a\ b\ a\ a\ c\ (a\ b)\ b\ b)\] where the elements grouped in the order indicates that they tie. There are \(8\) different possibilities in order to undo ties in the ranking of elements. If the same probability is assumed for all of them, the expected preference matrix between treatments is given distributing the preference in the comparison of repeated observations with the same weight, that is, assigning the value 0.5 to each of the treatments when two tied units are compared. The preference matrix for this example would be as follows:
\[\begin{matrix} & \begin{matrix} A & B & C \end{matrix} \\ \begin{matrix} A \\ B \\ C \end{matrix} & \begin{pmatrix} - & 42 & 18.5 \\ 8 & - & 2 \\ 11.5 & 13 & -\\ \end{pmatrix} \end{matrix}\] Note that the previous matrix represents the matrix of expected preferences if all permutations of items with ties in which they are undone are considered, with the same probability of tie between elements.
The order between treatments that maximizes the order between patients, corresponds to the order \((A\ C\ B)\), satisfying 73.5 of the 95 preferences contained in the matrix, where 73.5 is the solution of the Linear Ordering Problem. Therefore, 21.5 = 95-73.5 is the expected number of pairwise disagreements necessary to order the samples and obtain the order \((A\ C\ B)\), that is, the disorder is 21.5 or, equivalently, the Concordance coefficient is \(\tau = 1-21.5/47= 0.543\), a value with a significance greater than 0.05, \(p - value > 0.05\). In this case, the observed data do not show significant evidence in favor of a difference in the effectiveness of treatments.
The treatments A, B and C have average ranks of 7.45, 14 and 8.83, respectively, and the sum of ranks are \(R_A=74.5\), \(R_B=70\) and \(R_C=26.5\).
The Kruskal-Wallis statistic is given by: \[H = -3(n+1)+\frac{12}{n(n+1)}\sum \frac{R_i^2}{n_i}= -3(18+1)+\frac{12}{18(18+1)}\left( \frac{74.5^2}{10} +\frac{70^2}{5}+\frac{26.5^2}{3}\right)=5.074\]
If we make the adjustment in the statistic for ties, we get: \[\tilde{H} = \frac{H}{\displaystyle 1-\frac{ \sum_(t_i^3-t_i)}{N^3-N}} = \frac{5.074}{\displaystyle 1-\frac{(2^3-2)+(2^3-2)+(2^3-2)}{18^3-18}}=\displaystyle 5.0897\]
where \(t_i\) is the number of ties of each value.
In this case, the Kruskal-Wallis test provides the same conclusion as the Concordance test; uncertainty being greater when we have ties.
The ConcordanceTest R-package allows to perform the hypothesis test
for testing whether samples originate from the same distribution with
the function CT_Hypothesis_Test()
, which carries out the calculations
by simulation. It is used as follows:
CT_Hypothesis_Test(Sample_List, Num_Sim = 10000, H = 0, verbose = TRUE)
where Sample_List
is a list of numeric data vectors with the elements
of each sample, Num_Sim
is the number of used simulations (10,000 by
default), H
specifies whether the Kruskal-Wallis test must also be
done, and verbose
is a logical parameter that indicates whether some
progress report of the simulations should be given.
Example 2 (Cont.). We use the ConcordanceTest* package to perform the Concordance and Kruskal-Wallis tests of Example 2. We use 25,000 simulations.*
set.seed(12)
<- c(12,13,15,20,23,28,30,32,40,48)
A <- c(29,31,49,52,54)
B <- c(24,26,44)
C <- list(A, B, C)
Sample_List CT_Hypothesis_Test(Sample_List, Num_Sim = 25000, H = 1)
$results
-value
Statistic p0.574 0.04928
Concordance coefficient 5.600 0.05292
Kruskal Wallis
$C_p_value
1] 0.04928
[
$H_p_value
1] 0.05292 [
The function CT_Hypothesis_Test()
provides the value of the statistics
together with the p-value associated with each of them. The result of
the Kruskal-Wallis test is only returned if H
= 1. Note that the
approximated p-values obtained by simulation are close to the exact
ones, 0.04927 and 0.05223 for the Concordance coefficient and the
Kruskal-Wallis statistic, respectively.
An alternative to the contrast performed with the function
CT_Hypothesis_Test()
is to obtain the critical values of our contrast.
This can be done with the ConcordanceTest package both in an exact or
approximate way, using the function CT_Critical_Values()
. It is used
as follows:
CT_Critical_Values(Sample_Sizes, Num_Sim = 10000, H = 0, verbose = TRUE)
where Sample_Sizes
is a numeric vector \((n_1,\ldots,n_k)\) containing
the number of repetitions of each element, i.e., the size of each sample
in the experiment. Num_Sim
is the number of simulations carried out in
order to obtain the probability distribution of the statistics (10,000
by default). If Num_Sim
is set to 0, the critical values are obtained
in an exact way. Otherwise they are obtained by simulation. H
is the
parameter specifying whether the critical values of the Kruskal-Wallis
test must be calculated and returned, and verbose
is a logical
parameter that indicates whether some progress report of the simulations
should be given.
The function returns a list with two elements. C_results
are the
critical values and p-values for a desired significance levels of 0.1,
.05 and .01 of the Concordance coefficient, and H_results
are the
critical values and p-values of the Kruskal-Wallis statistic (only
returned if H = 1).
Example 2 (Cont.). We show the results of the function
CT_Critical_Values()
with sample sizes \(N=(10,5,3)\) and 25,000
simulations. The results allow us to compare the test statistics with
different significance levels.
set.seed(12)
<- c(10,5,3)
Sample_Sizes CT_Critical_Values(Sample_Sizes, Num_Sim = 25000, H = 1)
$C_results
| disorder | Concordance coefficient | p-value
10 23 0.51 0.0954
Sig level .05 20 0.57 0.0492
Sig level .01 14 0.70 0.0096
Sig level .
$H_results
| H Statistic | p-value
10 4.55 0.0995
Sig level .05 5.72 0.0497
Sig level .01 7.78 0.0097 Sig level .
To obtain the Concordance coefficient and the Kruskal-Wallis statistic
from the result of an experiment, the ConcordanceTest package has the
function CT_Coefficient()
. This function is useful when we only want
to obtain the value of the statistic to check its significance using
statistical tables. The function CT_Coefficient()
is used as follows:
CT_Coefficient(Sample_List, H = 0)
where Sample_List
is a list of numeric data vectors with the elements
of each sample, and H
is defined as usual.
Example 2 (Cont.). We show the results of the function
CT_Coefficient()
for the data in Example 2.
<- c(12,13,15,20,23,28,30,32,40,48)
A <- c(29,31,49,52,54)
B <- c(24,26,44)
C <- list(A, B, C)
Sample_List CT_Coefficient(Sample_List, H = 1)
$Sample_Sizes
1] 10 5 3
[
$order_elements
1] 1 1 1 1 1 3 3 1 2 1 2 1 1 3 1 2 2 2
[
$disorder
1] 20
[
$Concordance_Coefficient
1] 0.5744681
[
$H_Statistic
1] 5.6 [
The function CT_Coefficient()
returns a list with the following
elements: Sample_Sizes
is a numeric vector with the sample sizes,
order_elements
is a numeric vector containing the elements order,
disorder
is the disorder of the permutation given by order_elements
,
Concordance_Coefficient
is the value of the Concordance coefficient
\(\tau_c\), that is, 1 minus the relative disorder of the permutation
given by order_elements
, and H_Statistic
is the Kruskal-Wallis
statistic (only returned if H
= 1).
Note that we can also solve problems with ties (as in Example 3) with the ConcordanceTest package.
The graphical visualization of the probability distributions of the
Concordance coefficient and the Kruskal-Wallis statistic can be done
with the function CT_Probability_Plot()
. It is used as follows:
CT_Probability_Plot(C_freq = NULL, H_freq = NULL)
Using the function CT_Density_Plot()
of the ConcordanceTest package,
we can make an approximate representation of the density functions of
the statistics, assuming that the probability distributions represent a
sample of a continuous variable. It is used as follows:
CT_Density_Plot(C_freq = NULL, H_freq = NULL)
In both functions, C_freq
is the probability distribution of the
Concordance coefficient and H_freq
is the probability distribution of
the Kruskal-Wallis statistic, obtained exactly or approximately with the
function CT_Distribution()
. The function CT_Probability_Plot()
can
represent both probability distributions or only one of them (if it only
receives the parameter C_freq
or H_freq
). Equivalently, the function
CT_Density_Plot()
can represent both density distributions or only one
of them. Appendix B presents the empirical density probability functions
for several experiments, where sample sizes vary form \(N=(4,4)\) to
\(N=(5,5,4,4,4,4,4)\).
Example 2 (Cont.). Graphical visualization of the probability distributions and the density distributions of Example 2 generated by simulation. The first row of Figure 2 compares the probability distribution of the Concordance coefficient and the Kruskal-Wallis statistic. The second row of Figure 2 shows the probability density function of the Concordance coefficient (continuous line) and the Kruskal-Wallis statistic (dashed line). Note that the \(H\) statistic has been normalized between 0 and 1.
set.seed(12)
<- c(10,5,3)
Sample_Sizes <- CT_Distribution(Sample_Sizes, Num_Sim = 25000, H = 1)
ProbDistr layout(matrix(c(1,3,2,3), ncol=2))
CT_Probability_Plot(C_freq = ProbDistr$C_freq, H_freq = ProbDistr$H_freq)
CT_Density_Plot(C_freq = ProbDistr$C_freq, H_freq = ProbDistr$H_freq)
As we mentioned in Figure 1, Figure 2 also shows that similar values of the Kruskal-Wallis statistic present very different probabilities, and this leads to a less smooth function than that presented by the Concordance coefficient. We can also see that the Concordance coefficient presents a more symmetrical distribution. This performance is generalized and, therefore, we consider that the Concordance coefficient is more reliable than the Kruskal-Wallis statistic.
The ConcordanceTest package also contains the function LOP()
, which
solves the Linear Ordering Problem from a square data matrix. This
function allows to calculate the disorder of a permutation of elements
from the preference matrix induced by that permutation and, therefore,
it is necessary for the calculation of the Concordance coefficient. The
function LOP()
is used by functions CT_Distribution()
,
CT_Hypothesis_Test()
and CT_Coefficient()
. It is used as follows:
LOP(mat_LOP)
where mat_LOP
is the preference matrix defining the Linear Ordering
Problem, a numeric square matrix for which we want to obtain the
permutation of rows/columns that maximizes the sum of the elements above
the main diagonal.
The function LOP()
returns a list with the following elements:
obj_val
is the optimal value of the solution of the Linear Ordering
Problem, that is, the sum of the elements above the main diagonal under
the permutation rows/columns solution, permutation
is the solution of
the Linear Ordering Problem, that is, the rows/columns permutation, and
permutation_matrix
is the optimal permutation matrix of the Linear
Ordering Problem.
Example 2 (Cont.). The matrix of preferences between treatments observed in Example 2 was:
\[\begin{matrix}
&\begin{matrix} A & B & C \end{matrix} \\
\begin{matrix} A\\ B \\ C \end{matrix} & \begin{pmatrix} - & 43 & 19 \\
7 & - & 2\\
11 & 13 & -
\end{pmatrix}
\end{matrix}\]
If we apply the function LOP()
on this preference matrix we obtain the
following results:
<- matrix(c(0,7,11,43,0,13,19,2,0), nrow=3)
mat_LOP LOP(mat_LOP)
$obj_val
1] 75
[
$permutation
1] 1 3 2
[
$permutation_matrix
1] [,2] [,3]
[,1,] 0 1 1
[2,] 0 0 0
[3,] 0 1 0 [
As we saw previously, the order between treatments that maximizes the
order between patients corresponds to the order \((A\ C\ B)\)
(permutation
= 1 3 2), satisfying obj_val
= 75 of the preferences
contained in the matrix.
kruskal.test()
function from stats packageThe well-known stats package contains, among many other functions, the
function kruskal.test()
that performs a Kruskal-Wallis rank sum test.
In this section, we compare the results obtained with the
ConcordanceTest package presented in this work and the function
kruskal.test()
, making use of the dataset from (Hollander and Wolfe 1973)
referenced in the kruskal.test()
examples.
Example 4. Comparison of kruskal.test()
(stats package)
and CT_Hypothesis_Test()
functions with 25,000 simulations
(ConcordanceTest package) using the dataset from (Hollander and Wolfe 1973).
## Hollander & Wolfe (1973), 116.
## Mucociliary efficiency from the rate of removal of dust in normal
## subjects, subjects with obstructive airway disease, and subjects
## with asbestosis.
<- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects
x <- c(3.8, 2.7, 4.0, 2.4) # with obstructive airway disease
y <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis
z <- list(x, y, z)
Sample_List
kruskal.test(Sample_List)
-Wallis rank sum test
Kruskal
: Sample_List
data-Wallis chi-squared = 0.77143, df = 2, p-value = 0.68
Kruskal
set.seed(12)
CT_Hypothesis_Test(Sample_List, Num_Sim = 25000, H = 1)
results-value
Statistic p0.188 0.78408
Concordance coefficient -Wallis 0.771 0.71080
Kruskal
$C_p_value
1] 0.78408
[
$H_p_value
1] 0.7108 [
As can be observed, the value of the Kruskal-Wallis statistic is the same with both functions (0.771). However, the p-values associated with the statistic differ.
The Kruskal-Wallis statistic follows approximately a \(\chi^2\)
distribution with degrees of freedom equal to the number of groups minus
1 (Kruskal and Wallis 1952). For this reason, the function kruskal.test()
uses a
\(\chi^2\) distribution to approximate the p-value (using the function
pchisq()
). In the case of the function CT_Hypothesis_Test()
, it
calculates the p-values using the simulations performed (25,000 in this
example).
The function CT_Distribution()
of the ConcordanceTest package allows
the probability distribution tables of the Concordance coefficient and
Kruskal-Wallis statistic to be computed, and they can be obtained
exactly or by simulation. We can get the exact probability distribution
tables and, consequently, the exact p-values in Example 4 with
CT_Distribution(c(5,4,5), Num_Sim = 0, H = 1)
In Example 4, the exact p-value for the Kruskal-Wallis statistic is 0.71077. Therefore, the difference between our p-value obtained with 25,000 simulations (0.71080) and the exact one is 0.00003, while the difference between the p-value approximated by the \(\chi^2\) distribution (0.68) and the exact one is 0.03077. Regarding the Concordance coefficient, the exact p-value is 0.78468, hence, the difference between our p-value obtained with 25,000 simulations (0.78408) and the exact one is 0.0006.
It is worth noting that the function kruskal.test()
uses the \(\chi^2\)
distribution to approximate the p-value regardless of the size of the
samples, but (Kruskal and Wallis 1952) state that the Kruskal-Wallis statistic is
distributed approximately as a \(\chi^2\), unless the samples are too
small, in which case special approximations or exact tables should be
provided. On the contrary, the ConcordanceTest package can always
obtain a good approximation of the p-values, regardless of the size of
the samples, as long as a high number of simulations is used.
A new measure based on the Kendall-\(\tau\) distance is presented in this work to estimate the concordance of different samples. A statistical test to determine when different observations come from the same distribution is also introduced. A comparison with the classical Kruskal-Wallis test is introduced to show that both tests differ. As we have shown, the proposed coefficient is more appropriate and reliable than rank-based methods. This work also describes the R package ConcordanceTest (Alcaraz et al. 2022), which contains all the functions needed to work with the proposed Concordance coefficient and allows its comparison with the Kruskal-Wallis statistic.
This work aims to be an introduction of the new concordance measure between samples, but there still remains much to be done. There is a new problem and further challenges for researchers, for example: studying the asymptotic distribution of the Concordance coefficient, exploring the possibility of finding the exact distribution with the help of modern computing, or analyzing the power of the Concordance test presented in this work, among others.
The authors thank the grants PID2019-105952GB-I00 funded by Ministerio de Ciencia e Innovación/ Agencia Estatal de Investigación /10.13039/501100011033, Spain, and PROMETEO/2021/063 funded by the government of the Valencian Community, Spain.
Table 6 shows the Concordance coefficient (\(\tau_c\)) and Kruskal-Wallis statistic (\(H\)) for all possible results in an experiment with three treatments and two people in each treatment.
\(dis\) | \(\tau_c\) | \(H\) | \(dis\) | \(\tau_c\) | \(H\) | \(dis\) | \(\tau_c\) | \(H\) | |||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a | a | b | b | c | c | 0 | 1.0000 | 4.57 | b | a | a | b | c | c | 2 | 0.6667 | 3.43 | c | a | a | b | b | c | 4 | 0.3333 | 1.14 | |||||||||||||||
a | a | b | c | b | c | 1 | 0.8333 | 3.71 | b | a | a | c | b | c | 3 | 0.5000 | 2.00 | c | a | a | b | c | b | 3 | 0.5000 | 2.00 | |||||||||||||||
a | a | b | c | c | b | 2 | 0.6667 | 3.43 | b | a | a | c | c | b | 4 | 0.3333 | 1.14 | c | a | a | c | b | b | 2 | 0.6667 | 3.43 | |||||||||||||||
a | a | c | b | b | c | 2 | 0.6667 | 3.43 | b | a | b | a | c | c | 1 | 0.8333 | 3.71 | c | a | b | a | b | c | 5 | 0.1667 | 0.29 | |||||||||||||||
a | a | c | b | c | b | 1 | 0.8333 | 3.71 | b | a | b | c | a | c | 2 | 0.6667 | 2.57 | c | a | b | a | c | b | 4 | 0.3333 | 0.86 | |||||||||||||||
a | a | c | c | b | b | 0 | 1.0000 | 4.57 | b | a | b | c | c | a | 3 | 0.5000 | 2.00 | c | a | b | b | a | c | 6 | 0.0000 | 0.00 | |||||||||||||||
a | b | a | b | c | c | 1 | 0.8333 | 3.71 | b | a | c | a | b | c | 4 | 0.3333 | 0.86 | c | a | b | b | c | a | 5 | 0.1667 | 0.29 | |||||||||||||||
a | b | a | c | b | c | 2 | 0.6667 | 2.57 | b | a | c | a | c | b | 5 | 0.1667 | 0.29 | c | a | b | c | a | b | 3 | 0.5000 | 1.14 | |||||||||||||||
a | b | a | c | c | b | 3 | 0.5000 | 2.00 | b | a | c | b | a | c | 3 | 0.5000 | 1.14 | c | a | b | c | b | a | 4 | 0.3333 | 0.86 | |||||||||||||||
a | b | b | a | c | c | 2 | 0.6667 | 3.43 | b | a | c | b | c | a | 4 | 0.3333 | 0.86 | c | a | c | a | b | b | 1 | 0.8333 | 3.71 | |||||||||||||||
a | b | b | c | a | c | 3 | 0.5000 | 2.00 | b | a | c | c | a | b | 6 | 0.0000 | 0.00 | c | a | c | b | a | b | 2 | 0.6667 | 2.57 | |||||||||||||||
a | b | b | c | c | a | 4 | 0.3333 | 1.14 | b | a | c | c | b | a | 5 | 0.1667 | 0.29 | c | a | c | b | b | a | 3 | 0.5000 | 2.00 | |||||||||||||||
a | b | c | a | b | c | 3 | 0.5000 | 1.14 | b | b | a | a | c | c | 0 | 1.0000 | 4.57 | c | b | a | a | b | c | 6 | 0.0000 | 0.00 | |||||||||||||||
a | b | c | a | c | b | 4 | 0.3333 | 0.86 | b | b | a | c | a | c | 1 | 0.8333 | 3.71 | c | b | a | a | c | b | 5 | 0.1667 | 0.29 | |||||||||||||||
a | b | c | b | a | c | 4 | 0.3333 | 0.86 | b | b | a | c | c | a | 2 | 0.6667 | 3.43 | c | b | a | b | a | c | 5 | 0.1667 | 0.29 | |||||||||||||||
a | b | c | b | c | a | 5 | 0.1667 | 0.29 | b | b | c | a | a | c | 2 | 0.6667 | 3.43 | c | b | a | b | c | a | 4 | 0.3333 | 0.86 | |||||||||||||||
a | b | c | c | a | b | 5 | 0.1667 | 0.29 | b | b | c | a | c | a | 1 | 0.8333 | 3.71 | c | b | a | c | a | b | 4 | 0.3333 | 0.86 | |||||||||||||||
a | b | c | c | b | a | 6 | 0.0000 | 0.00 | b | b | c | c | a | a | 0 | 1.0000 | 4.57 | c | b | a | c | b | a | 3 | 0.5000 | 1.14 | |||||||||||||||
a | c | a | b | b | c | 3 | 0.5000 | 2.00 | b | c | a | a | b | c | 5 | 0.1667 | 0.29 | c | b | b | a | a | c | 4 | 0.3333 | 1.14 | |||||||||||||||
a | c | a | b | c | b | 2 | 0.6667 | 2.57 | b | c | a | a | c | b | 6 | 0.0000 | 0.00 | c | b | b | a | c | a | 3 | 0.5000 | 2.00 | |||||||||||||||
a | c | a | c | b | b | 1 | 0.8333 | 3.71 | b | c | a | b | a | c | 4 | 0.3333 | 0.86 | c | b | b | c | a | a | 2 | 0.6667 | 3.43 | |||||||||||||||
a | c | b | a | b | c | 4 | 0.3333 | 0.86 | b | c | a | b | c | a | 3 | 0.5000 | 1.14 | c | b | c | a | a | b | 3 | 0.5000 | 2.00 | |||||||||||||||
a | c | b | a | c | b | 3 | 0.5000 | 1.14 | b | c | a | c | a | b | 5 | 0.1667 | 0.29 | c | b | c | a | b | a | 2 | 0.6667 | 2.57 | |||||||||||||||
a | c | b | b | a | c | 5 | 0.1667 | 0.29 | b | c | a | c | b | a | 4 | 0.3333 | 0.86 | c | b | c | b | a | a | 1 | 0.8333 | 3.71 | |||||||||||||||
a | c | b | b | c | a | 6 | 0.0000 | 0.00 | b | c | b | a | a | c | 3 | 0.5000 | 2.00 | c | c | a | a | b | b | 0 | 1.0000 | 4.57 | |||||||||||||||
a | c | b | c | a | b | 4 | 0.3333 | 0.86 | b | c | b | a | c | a | 2 | 0.6667 | 2.57 | c | c | a | b | a | b | 1 | 0.8333 | 3.71 | |||||||||||||||
a | c | b | c | b | a | 5 | 0.1667 | 0.29 | b | c | b | c | a | a | 1 | 0.8333 | 3.71 | c | c | a | b | b | a | 2 | 0.6667 | 3.43 | |||||||||||||||
a | c | c | a | b | b | 2 | 0.6667 | 3.43 | b | c | c | a | a | b | 4 | 0.3333 | 1.14 | c | c | b | a | a | b | 2 | 0.6667 | 3.43 | |||||||||||||||
a | c | c | b | a | b | 3 | 0.5000 | 2.00 | b | c | c | a | b | a | 3 | 0.5000 | 2.00 | c | c | b | a | b | a | 1 | 0.8333 | 3.71 | |||||||||||||||
a | c | c | b | b | a | 4 | 0.3333 | 1.14 | b | c | c | b | a | a | 2 | 0.6667 | 3.43 | c | c | b | b | a | a | 0 | 1.0000 | 4.57 |
Table 7 shows the probability density function of the Concordance coefficient (continuous lines) and the Kruskal-Wallis statistic (dashed lines) generated by simulation. Number of simulations 100,000. Note that the \(H\) statistic has been normalized between 0 and 1.
Sample_Sizes=(4,4) | Sample_Sizes=(3,3,2) | Sample_Sizes=(2,2,2,2) |
Sample_Sizes=(5,4) | Sample_Sizes=(3,3,3) | Sample_Sizes=(3,2,2,2) |