Ranked set sampling (RSS) is an advanced data collection method when the exact measurement of an observation is difficult and/or expensive used in a number of research areas, e.g., environment, bioinformatics, ecology, etc. In this method, random sets are drawn from a population and the units in sets are ranked with a ranking mechanism which is based on a visual inspection or a concomitant variable. Because of the importance of working with a good design and easy analysis, there is a need for a software tool which provides sampling designs and statistical inferences based on RSS and its modifications. This paper introduces an R package as a free and easy-to-use analysis tool for both sampling processes and statistical inferences based on RSS and its modified versions. For researchers, the *RSSampling* package provides a sample with RSS, extreme RSS, median RSS, percentile RSS, balanced groups RSS, double versions of RSS, L-RSS, truncation-based RSS, and robust extreme RSS when the judgment rankings are both perfect and imperfect. Researchers can also use this new package to make parametric inferences for the population mean and the variance where the sample is obtained via classical RSS. Moreover, this package includes applications of the nonparametric methods which are one sample sign test, Mann-Whitney-Wilcoxon test, and Wilcoxon signed-rank test procedures. The package is available as *RSSampling* on CRAN.

Data collection is the crucial part in all types of scientific research. Ranked set sampling (RSS) is one of the advanced data collection methods, which provides representative sample data by using the ranking information of the sample units. It was firstly proposed by (McIntyre 1952) and the term "ranked set sampling" was introduced in the study of (Halls and Dell 1966) about the estimation of forage yields in a pine hardwood forest. (Takahasi and Wakimoto 1968) theoretically studied the efficiency of the mean estimator based on RSS which is unbiased for the population mean. They found that its variance is always smaller than the variance of the mean estimator based on simple random sampling (SRS) with the same sample size when the ranking is perfect. Some other results on the efficiency of RSS can be found in (Dell and Clutter 1972), (David and Levine 1972), and (Stokes 1980b). (Stokes 1977) studied the use of concomitant variables for ranking of the sample units in the RSS procedure and found that the ranking procedure was allowed to be imperfect. In another study, she constructed the estimator for the population variance in the presence of the ranking error (Stokes 1980a). For some examples and results on the regression estimation based on RSS, see, (Yu and Lam 1997) and (Chen 2001). The estimation of a distribution function with various settings of RSS can be found in (Stokes and Sager 1988), (Kvam and Samaniego 1993), and (Chen 2000). Other results on distribution-free test procedures based on RSS can be found in (Bohn and Wolfe 1992, 1994), and (Hettmansperger 1995). Additional results for inferential procedures based on RSS can be found in the recent works of (Zamanzade and Vock 2015), (Zhang et al. 2016), and (Ozturk 2018). For more details on RSS, we refer the review papers by (Kaur et al. 1995), (Chen et al. 2003), and (Wolfe 2012).

The RSS method and its modified versions have come into prominence
recently due to its efficiency and therefore new software tools or
packages for a quick evaluation is required. A free software called
Visual Sample Plan (VSP) created by Pacific Northwest National
Labaratory has many sampling designs including classical RSS method for
developing environmental sampling plans under balanced and unbalanced
cases. It provides the calculation of the required sample size and cost
information with the location to be sampled. Also, a package
*NSM3* by
(Schneider 2015) in R has two functions related to classical RSS
method. It only provides the Monte Carlo samples and computes a
statistic for a nonparametric procedure. Both the VSP and *NSM3* package
include only the classical RSS method as a sampling procedure and
provide limited methods for inference. Therefore, there is no extensive
package for sampling and statistical inference using both classical and
modified RSS methods in any available software packages. In this study,
we propose a pioneering package, named
*RSSampling*, for
sampling procedures based on the classical RSS and the modified RSS
methods in both perfect and imperfect ranking cases. Also, the package
provides the estimation of the mean and the variance of the population
and allows the use of the one sample sign, Mann-Whitney-Wilcoxon, and
Wilcoxon signed-rank test procedures under classical RSS. The
organization of the paper is as follows: in the following section, we
give some brief information about classical RSS and modified RSS
methods. Then, we introduce the details of *RSSampling* package and
further, we give some illustrative examples with a real data analysis.
In the last section, we give the conclusion of the study.

RSS and its modifications are advanced sampling methods using the rank information of the sample units. The ranking of the units can be done by visual inspection of a human expert or a concomitant variable. The procedure for the RSS method is as follows:

Select \(m\) units at random from a specified population.

Rank these \(m\) units by judgment without actual measurement.

Keep the smallest judged unit from the ranked set.

Select second set of \(m\) units at random from a specified population, rank these units without measuring them, keep the second smallest judged unit.

Continue the process until \(m\) ranked units are measured.

The first five steps are referred to as a cycle. Then, the cycle repeats \(r\) times and a ranked set sample of size \(n=\)\(mr\) is obtained. Figure 1 illustrates the RSS procedure with visual inspection for the case of \(r=1\) and \(m=3\), and in the following scheme, \(X_{i(j:m)}\) represents the \(j\)th ranked unit in \(i\)th set where \(i=1,2,\dots,m\) and \(j=1,2,\dots,m\) and bold units represent the units which are chosen to ranked set sample.

\[\begin{pmatrix} \mathbf{X_{1(1:3)}} \leq & X_{1(2:3)}\leq &X_{1(3:3)} \\ X_{2(1:3)} \leq & \mathbf{X_{2(2:3)}}\leq &X_{2(3:3)} \\ X_{3(1:3)} \leq & X_{3(2:3)}\leq &\mathbf{X_{3(3:3)}} \end{pmatrix}\]

RSS design obtains more representative samples and gives more precise
estimates of the population parameters relative to SRS (EPA 2012). The main
difference between the RSS method and the other modified methods is the
selection procedure of the sample units from the ranked sets. For
example, (Samawi et al. 1996) suggested extreme RSS using the minimum or maximum
units from each ranked set. (Muttlak 1997) introduced median RSS using only
median units of the random sets. (Jemain et al. 2008) suggested balanced
groups RSS which is defined as the combination of extreme RSS and median
RSS. For additional examples of modified methods, see (Muttlak 2003b),
(Al-Saleh and Al-Kadiri 2000), and for robust methods see, (Al-Nasser 2007),
(Al-Omari and Raqab 2013), and (Al-Nasser and Mustafa 2009). In literature, the studies for
modified RSS methods are generally interested in obtaining a sample more
easily or making a more robust estimation for a population parameter.
Such studies are made for the investigation of properties (for example,
bias and mean squared error) of a proposed estimator and they have
generally focused on the comparisons of SRS and RSS methods. Note that
the true comparisons of the modified RSS methods to the others are
difficult to present in general terms. Because the advantages of the
sampling methods, when compared to each other, may vary according to the
situations such as the parameter to be estimated, underlying
distribution, the presence of ranking error, etc. For more detailed
information on the modifications of RSS, see (Al-Omari and Bouza 2014) and
references therein. In the following, the modified RSS methods which are
considered in *RSSampling* are introduced.

Extreme RSS (ERSS) is the first modification of RSS suggested by (Samawi et al. 1996) to estimate the population mean only using the minimum or maximum ranked units from each set. The procedure for ERSS can be described as follows: select \(m\) random sets each of size \(m\) units from the population and rank the units within each set by a human expert or a concomitant variable. If the set size \(m\) is even, the lowest ranked units of each set are chosen from the first \(m/2\) sets, and the largest ranked units of each set are chosen from the other \(m/2\) sets. If the set size is odd, the lowest ranked units from the first \((m-1)/2\) sets, the largest ranked units from the other \((m-1)/2\) sets and median unit from the remaining last set are chosen. If we repeat the procedure \(r\) times, we have a sample of size \(n = mr\). An example of the procedure for \(r=1\) and \(m=4\) is shown below.

\[\begin{pmatrix} \mathbf{X_{1(1:4)}} \leq & X_{1(2:4)}\leq &X_{1(3:4)}\leq &X_{1(4:4)} \\ \mathbf{X_{2(1:4)}} \leq & X_{2(2:4)}\leq &X_{2(3:4)}\leq &X_{2(4:4)} \\ X_{3(1:4)} \leq & X_{3(2:4)}\leq &X_{3(3:4)}\leq &\mathbf{X_{3(4:4)}} \\ X_{4(1:4)} \leq & X_{4(2:4)}\leq &X_{4(3:4)}\leq &\mathbf{X_{4(4:4)}} \end{pmatrix}\]

Median RSS (MRSS) was suggested by (Muttlak 1997). In this method, only median units of the random sets are chosen as the sample for estimation of population mean. For the odd set sizes, the \(((m+1)/2)\)th ranked units are chosen as the median of each set. For even set sizes, the \((m/2)\)th ranked units are chosen from the first \(m/2\) sets and the \(((m+2)/2)\)th ranked units are chosen from the remaining \(m/2\) sets. If necessary, procedure can be repeated \(r\) times and we have \(n=mr\) sample of size. An example of the procedure for \(r=1\) and \(m=3\) is shown below.

\[\begin{pmatrix} X_{1(1:3)} \leq & \mathbf{X_{1(2:3)}}\leq &X_{1(3:3)} \\ X_{2(1:3)} \leq & \mathbf{X_{2(2:3)}}\leq &X_{2(3:3)} \\ X_{3(1:3)} \leq & \mathbf{X_{3(2:3)}}\leq &X_{3(3:3)} \end{pmatrix}\]

(Muttlak 2003b) suggested another modification for the RSS, percentile RSS (PRSS), where only the upper and lower percentiles of the random sets are chosen as the sample for selected value of \(p\), where \(0 \leq p \leq 1\). Suppose that \(m\) random sets with the size \(m\) are chosen from a specific population to sample \(m\) units and ranked visually or with a concomitant variable. If the set size is even, the \((p(m+1))\)th smallest units from the first \(m/2\) sets and the \(((1-p)(m+1))\)th smallest units from the other \(m/2\) sets are chosen. If \(m\) is odd, the \((p(m+1))\)th smallest units are chosen from the first \((m-1)/2\) sets, the \(((1-p)(m+1))\)th smallest units are chosen from the other \((m-1)/2\) sets and the median unit is chosen as the \(m\)th unit from the last set. An example of the procedure for \(r=1\), \(m=5\) and \(p=0.3\) is as below.

\[\begin{pmatrix} X_{1(1:5)} \leq & \mathbf{X_{1(2:5)}}\leq &X_{1(3:5)}\leq &X_{1(4:5)}\leq &X_{1(5:5)} \\ X_{2(1:5)} \leq & \mathbf{X_{2(2:5)}}\leq &X_{2(3:5)}\leq &X_{2(4:5)}\leq &X_{2(5:5)} \\ X_{3(1:5)} \leq & X_{3(2:5)}\leq &X_{3(3:5)}\leq &\mathbf{X_{3(4:5)}}\leq &X_{3(5:5)}\\ X_{4(1:5)} \leq & X_{4(2:5)}\leq &X_{4(3:5)}\leq &\mathbf{X_{4(4:5)}} \leq &X_{4(5:5)} \\ X_{5(1:5)} \leq & X_{5(2:5)}\leq &\mathbf{X_{5(3:5)}}\leq &X_{5(4:5)}\leq &X_{5(5:5)} \end{pmatrix}\]

Balanced groups RSS (BGRSS) can be defined as the combination of ERSS and MRSS. (Jemain et al. 2008) suggested to use BGRSS for estimating the population mean with a special sample size \(m = 3k\). In their study, BGRSS procedure can be described as follows: \(m=3k\) (where \(k=1,2,3,\dots\) ) sets each size of \(m\) are selected randomly from a specific population. The sets are randomly allocated into three groups and units in each set are ranked. The smallest units from the first group, median units from the second group and the largest units from the third group of ranked sets are chosen. When the set size is odd, the median unit in the second group is defined as the \(((m+1)/2)\)th ranked unit in the set and when the set size is even, the median unit is defined as the mean of the \((m/2)\)th and the \(((m+2)/2)\)th ranked units. BGRSS process for one cycle and \(k=2\) can be described as below.

\[\begin{pmatrix} \mathbf{X_{1(1:6)}}\leq &X_{1(2:6)}\leq &X_{1(3:6)}\leq &X_{1(4:6)}\leq &X_{1(5:6)}\leq &X_{1(6:6)} \\ \mathbf{X_{2(1:6)}}\leq & X_{2(2:6)}\leq &X_{2(3:6)}\leq &X_{2(4:6)}\leq &X_{2(5:6)}\leq &X_{2(6:6)} \\ X_{3(1:6)} \leq & X_{3(2:6)}\leq &\mathbf{X_{3(3:6)}}\leq &\mathbf{X_{3(4:6)}}\leq &X_{3(5:6)}\leq &X_{3(6:6)}\\ X_{4(1:6)} \leq & X_{4(2:6)}\leq &\mathbf{X_{4(3:6)}}\leq &\mathbf{X_{4(4:6)}}\leq &X_{4(5:6)}\leq &X_{4(6:6)} \\ X_{5(1:6)} \leq & X_{5(2:6)}\leq &X_{5(3:6)}\leq &X_{5(4:6)}\leq &X_{5(5:6)}\leq &\mathbf{X_{5(6:6)}} \\ X_{6(1:6)} \leq & X_{6(2:6)}\leq &X_{6(3:6)}\leq &X_{6(4:6)}\leq &X_{6(5:6)}\leq &\mathbf{X_{6(6:6)}} \end{pmatrix}\]

(Al-Saleh and Al-Kadiri 2000) introduced another modification of RSS, that is double RSS (DRSS) as a beginning of multistage procedure. Several researchers also extended the DRSS method to modified versions such as double extreme RSS (DERSS) by (Samawi 2002), double median RSS (DMRSS) by (Samawi and Tawalbeh 2002), and double percentile RSS (DPRSS) by (Jemain and Al-Omari 2006). The DRSS procedure is described as follows: \(m^3\) units are identified from the target population and divided randomly into \(m\) groups, the size of each is \(m^2\). Then, the usual RSS procedure is used on each group to obtain \(m\) ranked set samples each of size \(m\). Finally, RSS procedure is applied again on the obtained ranked set samples in the previous step to get a double ranked set sample of size \(m\).

L-RSS, which is a robust RSS procedure, is based on the idea of L statistic and it was introduced by (Al-Nasser 2007) as a generalization of different type of RSS methods. The first step for L-RSS procedure is selecting \(m\) random sets with \(m\) units and ranking the units in each set. Let \(k\) be the L-RSS coefficient, where \(k=\lfloor m \alpha \rfloor\) for \(0 \leq \alpha < 0.5\) and \(\lfloor m\alpha \rfloor\) is the largest integer value less than or equal to \(m \alpha\). Then, the \((k+1)\)th ranked units from the first \(k+1\) sets, \((m-k)\)th ranked units from the last \(k+1\) sets and \(i\)th ranked units from the remaining sets which are numbered with \(i\), where \(i=k+2,\ldots,m-k-1\) are selected. The L-RSS procedure for the case of \(m=6\) and \(k=1\) (\(\alpha=0.20\)) in a cycle can be shown as below:

\[\begin{pmatrix} X_{1(1:6)} \leq & \mathbf{X_{1(2:6)}}\leq &X_{1(3:6)}\leq &X_{1(4:6)}\leq &X_{1(5:6)}\leq &X_{1(6:6)} \\ X_{2(1:6)} \leq & \mathbf{X_{2(2:6)}}\leq &X_{2(3:6)}\leq &X_{2(4:6)}\leq &X_{2(5:6)}\leq &X_{2(6:6)} \\ X_{3(1:6)} \leq & X_{3(2:6)}\leq &\mathbf{X_{3(3:6)}}\leq &X_{3(4:6)}\leq &X_{3(5:6)}\leq &X_{3(6:6)}\\ X_{4(1:6)} \leq & X_{4(2:6)}\leq &X_{4(3:6)}\leq &\mathbf{X_{4(4:6)}}\leq &X_{4(5:6)}\leq &X_{4(6:6)} \\ X_{5(1:6)} \leq & X_{5(2:6)}\leq &X_{5(3:6)}\leq &X_{5(4:6)}\leq &\mathbf{X_{5(5:6)}}\leq &X_{5(6:6)} \\ X_{6(1:6)} \leq & X_{6(2:6)}\leq &X_{6(3:6)}\leq &X_{6(4:6)}\leq &\mathbf{X_{6(5:6)}}\leq &X_{6(6:6)} \end{pmatrix}\]

When \(k=0\), then this procedure leads to the classical RSS and when \(k=\lfloor (m-1)/2 \rfloor\), then it leads to the MRSS method.

The truncation-based RSS (TBRSS) was presented by (Al-Omari and Raqab 2013). This procedure can be summarized as follows: select randomly \(m\) sets each of size \(m\) units from the population and rank the units in each set. Then, determine TBRSS coefficient \(k\) as in the L-RSS method and select the minimums of the first \(k\) sets and the maximums of the last \(k\) sets. From the remaining \(m-2k\) samples, select the \(i\)th ranked unit of the \(i\)th sample \((k+1 \leq i \leq m-k)\). The one cycled TBRSS method for the case of \(m=8\) and \(k=2\) (\(\alpha=0.35\)) is shown below.

\[\begin{pmatrix} \mathbf{X_{1(1:8)}}\leq & X_{1(2:8)}\leq &X_{1(3:8)}\leq &X_{1(4:8)}\leq &X_{1(5:8)}\leq &X_{1(6:8)}\leq &X_{1(7:8)}\leq &X_{1(8:8)} \\ \mathbf{X_{2(1:8)}}\leq &X_{2(2:8)}\leq &X_{2(3:8)}\leq &X_{2(4:8)}\leq &X_{2(5:8)}\leq &X_{2(6:8)}\leq &X_{2(7:8)}\leq &X_{2(8:8)} \\ X_{3(1:8)}\leq &X_{3(2:8)}\leq &\mathbf{X_{3(3:8)}}\leq &X_{3(4:8)}\leq &X_{3(5:8)}\leq &X_{3(6:8)}\leq &X_{3(7:8)}\leq &X_{3(8:8)} \\ X_{4(1:8)}\leq &X_{4(2:8)}\leq &X_{4(3:8)}\leq &\mathbf{X_{4(4:8)}}\leq &X_{4(5:8)}\leq &X_{4(6:8)}\leq &X_{4(7:8)}\leq &X_{4(8:8)}\\ X_{5(1:8)}\leq &X_{5(2:8)}\leq &X_{5(3:8)}\leq &X_{5(4:8)}\leq &\mathbf{X_{5(5:8)}}\leq &X_{5(6:8)}\leq &X_{5(7:8)}\leq &X_{5(8:8)} \\ X_{6(1:8)}\leq &X_{6(2:8)}\leq &X_{6(3:8)}\leq &X_{6(4:8)}\leq &X_{6(5:8)}\leq &\mathbf{X_{6(6:8)}}\leq &X_{6(7:8)}\leq &X_{6(8:8)} \\ X_{7(1:8)}\leq &X_{7(2:8)}\leq &X_{7(3:8)}\leq &X_{7(4:8)}\leq &X_{7(5:8)}\leq &X_{7(6:8)}\leq &X_{7(7:8)}\leq &\mathbf{X_{7(8:8)}} \\ X_{8(1:8)}\leq &X_{8(2:8)}\leq &X_{8(3:8)}\leq &X_{8(4:8)}\leq &X_{8(5:8)}\leq &X_{8(6:8)}\leq &X_{8(7:8)}\leq &\mathbf{X_{8(8:8)}} \end{pmatrix}\]

Note that when \(k = 0\) or \(k = 1\), TBRSS scheme is equivalent to the classical RSS scheme.

Robust extreme RSS (RERSS) scheme was introduced by (Al-Nasser and Mustafa 2009). This method can be described as follows: identify \(m\) random sets with \(m\) units and rank the units within each set. Select the \((k+1)\)th ranked units from the first \(m/2\) sets where \(k=\lfloor m \alpha \rfloor\) for \(0 < \alpha < 0.5\) and \(\lfloor m\alpha \rfloor\) is the largest integer value less than or equal to \(m \alpha\). Then, select the \((m-k)\)th ranked units from the other \(m/2\) sets. If the set size \(m\) is odd, \(((m+1)/2)\)th ranked unit is selected additionally from the last remaining set. The procedure for one cycle and the case of \(m=6\) and \(k=1\) (\(\alpha=0.20\)) can be shown as below.

\[\begin{pmatrix} X_{1(1:6)} \leq & \mathbf{X_{1(2:6)}}\leq &X_{1(3:6)}\leq &X_{1(4:6)}\leq &X_{1(5:6)}\leq &X_{1(6:6)} \\ X_{2(1:6)} \leq & \mathbf{X_{2(2:6)}}\leq &X_{2(3:6)}\leq &X_{2(4:6)}\leq &X_{2(5:6)}\leq &X_{2(6:6)} \\ X_{3(1:6)} \leq & \mathbf{X_{3(2:6)}}\leq &X_{3(3:6)}\leq &X_{3(4:6)}\leq &X_{3(5:6)}\leq &X_{3(6:6)}\\ X_{4(1:6)} \leq & X_{4(2:6)}\leq &X_{4(3:6)}\leq &X_{4(4:6)}\leq &\mathbf{X_{4(5:6)}}\leq &X_{4(6:6)} \\ X_{5(1:6)} \leq & X_{5(2:6)}\leq &X_{5(3:6)}\leq &X_{5(4:6)}\leq &\mathbf{X_{5(5:6)}}\leq &X_{5(6:6)} \\ X_{6(1:6)} \leq & X_{6(2:6)}\leq &X_{6(3:6)}\leq &X_{6(4:6)}\leq &\mathbf{X_{6(5:6)}}\leq &X_{6(6:6)} \end{pmatrix}\]

If \(k=0\) and \(k=(m/2)\), then this sampling procedure corresponds to ERSS and MRSS methods, respectively.

The package *RSSampling* is available on CRAN and can be installed and
loaded via the following commands:

```
> install.packages("RSSampling")
> library("RSSampling")
```

The package depends on the
*stats* package and uses a
function from the non-standard package
*LearnBayes* (Albert 2018) for
random data generation in the Examples section. The proposed package
consists of two main parts which are the functions for sampling methods
described in Table 1 and the functions for inference
procedures described in Table 2 based on RSS. The sampling
part of the package includes perfect and imperfect rankings with a
concomitant variable allowing researchers to sample with classical RSS
and the modified versions. The functions for inference procedures
provide estimation for parameters and some hypothesis testing procedures
based on RSS.

In this part, we introduce a core function, which is called
`rankedsets`

, to obtain `s`

ranked sets consisting of randomly chosen
sample units with the set size \(m\). By using this function, we developed
the functions given in Table 1 which provide researchers means
to obtain a sample under different sampling schemes. One can also use
`rankedsets`

function for the studies based on other modified RSS
methods which are not mentioned in this paper.

Function | Description |

`rss` |
Performs classical RSS method |

`Mrss` |
Performs modified RSS methods (MRSS, ERSS, PRSS,BGRSS) |

`Rrss` |
Performs robust RSS methods (L-RSS, TBRSS, RERSS) |

`Drss` |
Performs double RSS methods (DRSS, DMRSS, DERSS, DPRSS) |

`con.rss` |
Performs classical RSS method by using a concomitant variable |

`con.Mrss` |
Performs modified RSS methods (MRSS, ERSS, PRSS,BGRSS) by using a concomitant variable |

`con.Rrss` |
Performs robust RSS methods (L-RSS, TBRSS, RERSS) by using a concomitant variable |

`obsno.Mrss` |
Determines the observation numbers of the units which will be chosen to the sample for classical and modified RSS methods by using a concomitant variable |

The function `rss`

provides the ranked set sample with perfect ranking
from a specific data set, \(X\), provided in matrix form where the columns
and rows represent the sets and cycles, respectively. One can see the
randomly chosen ranked sets by defining `sets = TRUE`

(default
`sets = FALSE`

) with the set size \(m\) and the cycle size \(r\). For the
modified RSS methods, the function `Mrss`

provides a sample from MRSS,
ERSS, PRSS, and BGRSS which are represented by `"m"`

,`"e"`

, `"p"`

, and
`"bg"`

, respectively. The `type = "r"`

, defined as the default,
represents the classical RSS. For the sampling procedure PRSS, there is
an additional parameter `p`

which defines the percentile. We note that,
when `p = 0.25`

in PRSS, one can obtain a sample with quartile RSS given
by (Muttlak 2003a). `Rrss`

provides samples from L-RSS,
TBRSS, and RERSS methods which are represented by `"l"`

, `"tb"`

, and
`"re"`

, respectively. The parameter `alpha`

is the common parameter for
these methods and defines the cutting value. `Drss`

function is for
double versions of RSS, MRSS, ERSS, and PRSS under perfect ranking.
`type = "d"`

is defined as the default which represents the double RSS.
Values `"dm"`

, `"de"`

, and `"dp"`

are defined for DMRSS, DERSS, and
DPRSS methods, respectively.

In the literature, most of the theoretical inferences and numerical studies are conducted based on perfect ranking. However, in real life applications, the ranking process is done with an expert judgment or a concomitant variable. Let us consider RSS with a concomitant variable \(Y\). A set of \(m\) units is drawn from the population, then the units are ranked by the order of \(Y\). The concomitant variable \(Y_{i(j:m)}\) represents the \(j\)th ranked unit in \(i\)th set and the variable of interest \(X_{(i,j)}\) represents the \(j\)th unit in \(i\)th set, where \(i=1,2,\dots,m\) and \(j=1,2,\dots,m\). In the following example, the procedure of RSS using \(Y\) is given for \(m=3\).

\[\begin{matrix} (\mathbf{Y_{1(1:3)}}, \mathbf{X_{(1,1)}}) \leq & (Y_{1(2:3)}, X_{(1,2)}) \leq & (Y_{1(3:3)}, X_{(1,3)}) & \longrightarrow & \mathbf{X_{(1,1)}} \\ (Y_{2(1:3)}, X_{(2,1)}) \leq & (\mathbf{Y_{2(2:3)}}, \mathbf{X_{(2,2)}}) \leq & (Y_{2(3:3)}, X_{(2,3)}) & \longrightarrow & \mathbf{X_{(2,2)}} \\ (Y_{3(1:3)}, X_{(3,1)}) \leq & (Y_{3(2:3)}, X_{(3,2)}) \leq & (\mathbf{Y_{3(3:3)}}, \mathbf{X_{(3,3)}}) & \longrightarrow & \mathbf{X_{(3,3)}} \end{matrix}\]

The functions `con.rss`

, `con.Mrss`

, and `con.Rrss`

provide methods to
obtain a sample under imperfect ranking. With the `con.rss`

function, a
researcher can obtain a classical ranked set sample from a specific data
set using a concomitant variable \(Y\) with the set size \(m\) and cycle
size \(r\) to make inference about the variable of interest \(X\). The
functions `con.Mrss`

and `con.Rrss`

have similar usage with `con.rss`

function except the selection method which is defined by `type`

parameter. Also, these functions are simply extensions of the `Mrss`

and
`Rrss`

for concomitant variable cases.

In a real-world research, the values of the variable of interest \(X\) are
unknown and the researchers measure \(X\) values of the sample units after
choosing them from the population with a specific sampling method. The
function `obsno.Mrss`

provides the code for this kind of application,
when the researchers prefer to use RSS methods. After determining the
sample frame and the concomitant variable to be used for ranking, the
code provides the number of the units to be selected according the
values of the concomitant variable. Then, the researcher obtain easily
the observation numbers of the units which will be chosen to the sample.
`type = "r"`

is defined as the default which represents the classical
RSS. MRSS, ERSS, PRSS, and BGRSS are represented by `"m"`

, `"e"`

,
`"p"`

, and `"bg"`

, respectively.

Statistical inference refers to the process of drawing conclusions and
having an information about the interested population. Researchers are
generally interested in fundamental inferences for the parameters such
as mean and variance. Using the *RSSampling* package, we provide an easy
way to estimate the parameters about the interested population and to
use some distribution-free tests; namely the sign,
Mann-Whitney-Wilcoxon, and Wilcoxon signed-rank tests for nonparametric
inference when the sampling procedure is RSS.

Function | Description |

`meanRSS` |
Performs mean estimation and hypothesis testing with classical RSS method |

`varRSS` |
Performs variance estimation with classical RSS method |

`regRSS` |
Performs regression estimation for mean of interested population with classical RSS method |

`sign1testrss` |
Performs one sample sign test with classical RSS method |

`mwwutestrss` |
Performs Mann-Whitney-Wilcoxon test with classical RSS method |

`wsrtestrss` |
Performs Wilcoxon signed-rank test with classical RSS method |

The `meanRSS`

function provides point estimation, confidence interval
estimation, and asymptotic hypothesis testing for the population mean
based on RSS see, (Chen et al. 2003). For the variance estimation based
on RSS, we define `varRSS`

function which has two `type`

parameters;
`"Stokes"`

and `"Montip"`

. (Stokes 1980a) proved that estimator of variance
is asymptotically unbiased regardless of presence of ranking error. For
the `"Montip"`

type estimation, (Tiensuwan and Sarikavanij 2003) showed that
there is no unbiased estimator of variance for one cycle but they
proposed unbiased estimator of variance for more than one cycle. With
`regRSS`

function, regression estimator for mean of interested
population can be obtained based on RSS. The \(\beta\) coefficient (`"B"`

in `regRSS`

function) is calculated under the assumption of known
population mean for concomitant \(Y\). Note that, the ranked set samples
for interested variable \(X\) and for concomitant variable \(Y\) must be the
same length. One can find the detailed information about regression
estimator based on RSS in (Yu and Lam 1997).

Finally, for nonparametric inference, `sign1testrss`

, `mwwutestrss`

, and
`wsrtestrss`

functions implement, respectively, the sign test, the
Mann-Whitney-Wilcoxon test, and the Wilcoxon signed-rank test depending
on RSS. The normal approximation is used to construct the test
statistics and an approximate confidence intervals. For detailed
information on these test methods, see the book of (Chen et al. 2003).

In this section, we present examples illustrating the *RSSampling*
package.

This example shows the process to obtain a sample by using TBRSS method
for the variable of interest, \(X\), ranked by using the concomitant
variable \(Y\) assuming that they are distributed as multivariate normal.
We determined the set size `m`

is 4 and the cycle size `r`

is 2. The
ranked sets of \(Y\) and the sets of \(X\) are obtained using the function
`con.Rrss`

. Thus, the resultant sample for \(X\) is given as below.

```
##Loading packages
library("RSSampling")
library("LearnBayes")
## Imperfect ranking example for interested (X) and concomitant (Y) variables
## from multivariate normal dist.
set.seed(1)
<- c(10, 8)
mu <- c(5, 3)
variance <- matrix(c(1, 0.9, 0.9, 1), 2, 2)
a <- diag(variance)
v <- v%*%a%*%v
Sigma <- rmnorm(10000, mu, Sigma)
x <- as.numeric(x[,1])
xx <- as.numeric(x[,2])
xy
## Selecting a truncation-based ranked set sample
con.Rrss(xx, xy, m = 4, r = 2, type = "tb", sets = TRUE, concomitant = FALSE,
alpha = 0.25)
\
$corr.coef
\1] 0.9040095
[
$var.of.interest
\1] [,2] [,3] [,4]
[,1,] 12.332134 13.116611 15.675967 21.72312
[2,] 11.350275 8.846237 10.164005 17.07950
[3,] 4.143757 9.608573 8.708221 11.57671
[4,] 2.284106 9.535388 12.709489 14.11595
[5,] 3.212739 8.089833 11.430411 14.53190
[6,] 6.556222 12.759335 13.210037 11.02219
[7,] 3.337564 -0.864634 12.800243 13.47315
[8,] 5.988893 8.850680 13.208956 15.82731
[
$concomitant.var.
\1] [,2] [,3] [,4]
[,1,] 8.034720 10.398398 11.800919 13.754743
[2,] 8.003575 8.118947 11.136804 12.149531
[3,] 4.733177 7.377396 8.866563 11.658837
[4,] 4.027061 8.008146 9.977435 10.912382
[5,] 3.909958 6.220087 7.564130 8.739562
[6,] 5.893001 8.760754 10.067927 10.244593
[7,] 2.119661 2.813413 10.651769 10.775596
[8,] 5.406154 7.722866 8.602551 10.874853
[
$sample.x
\= 1 m = 2 m = 3 m = 4
m = 1 12.332134 8.846237 8.708221 14.11595
r = 2 3.212739 12.759335 12.800243 15.82731 r
```

Random determination of the sample units is an important task for
practitioners. The function `obsno.Mrss`

is for the practitioners who
have the frame of the population with unknown variable \(X\) and known
concomitant variable \(Y\). In the following example, the observation
numbers for median ranked set sample units are obtained in order to take
the measurement of the interested variable \(X\).

```
## Loading packages
library("RSSampling")
## Generating concomitant variable (Y) from exponential dist.
set.seed(5)
= rexp(10000)
y
## Determining the observation numbers of the units which are chosen to sample
obsno.Mrss(y, m = 3, r = 5, type = "m")
= 1 m = 2 m = 3
m = 1 "Obs. 2452" "Obs. 6417" "Obs. 3227"
r = 2 "Obs. 9094" "Obs. 1805" "Obs. 9877"
r = 3 "Obs. 1333" "Obs. 9252" "Obs. 3219"
r = 4 "Obs. 6397" "Obs. 7038" "Obs. 5019"
r = 5 "Obs. 446" "Obs. 9663" "Obs. 10" r
```

In order to illustrate the usage of the package, we give a simulation
study with 10,000 repetitions for mean estimation of \(X\) based on RSS
method using a concomitant variable. It demonstrates the effect of the
correlation level between \(X\) and \(Y\) on the mean squared error (MSE) of
estimation. Samples are obtained when `m = 5`

and r = 10 assuming that
\(X\) and \(Y\) are distributed as multivariate normal. Figure 2 as
an output of the simulation study indicates that when the correlation
level is increasing, MSE values are decreasing.

```
## Loading packages
library("RSSampling")
library("LearnBayes")
## Imperfect ranking example for interested (X) and concomitant (Y) variables
## from multivariate normal dist.
<- c(10, 8)
mu <- c(5, 3)
variance = seq(0, 0.9, 0.1)
rho = mse.x = numeric()
se.x = 10000
repeatsize for (i in 1:length(rho)) \{
set.seed(1)
\ <- matrix(c(1, rho[i], rho[i], 1), 2, 2)
\ a <- diag(variance)
\ v <- v%*%a%*%v
\ Sigma <- rmnorm(10000, mu, Sigma)
\ x <- as.numeric(x[,1])
\ xx <- as.numeric(x[,2])
\ xy for (j in 1:repeatsize) \{
\ set.seed(j)
\ = con.Mrss(xx, xy, m = 5, r = 10, type = "r", sets = FALSE,
\ samplex concomitant = FALSE)\$sample.x
\ = (mean(samplex)-mu[1])^2
\ se.x[j]
\ \}= sum(se.x)/repeatsize
\ mse.x[i]
\}plot(rho[-1], mse.x[-1], type = "o", lwd = 2,
main = "MSE values based on increasing correlation levels",
xlab = "corr.coef.", ylab = "MSE", cex = 1.5, xaxt = "n")
axis(1, at = seq(0.1, 0.9, by = 0.1))
```

In this real data example, we used the abolone data set which is freely available at https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data. The data consists of \(9\) variables of \(4177\) units and the variables are; sex (Male/Female/Infant), length (mm), diameter (mm), height (mm), whole weight (grams), shucked weight (grams), viscera weight (grams), shell weight (grams), and rings (\(+1.5\) gives the age of abalone in years), respectively. The data comes from an original study of the population biology of abalone by (Nash et al. 1994). Also, (Cetintav et al. 2016) and (Sevinç et al. 2018) used the abalone data set for application of the fuzzy based modification of RSS and partial groups RSS methods, respectively. The data set can be obtained easily by using the following R command.

```
<- read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases
abaloneData \ /abalone/abalone.data"), header = FALSE, col.names = c("sex", "length",
"diameter", "height", "whole.weight", "shucked.weight", "viscera.weight",
\ "shell.weight", "rings")) \
```

Suppose that we aimed to estimate the mean of \(viscera\) \(weight\) and confidence interval and also test the hypothesis claiming that the mean of the \(viscera\) \(weight\) is equal to \(0.18\). The measurement of \(viscera\) \(weight\) which is the gut weight of abalone after bleeding is an expensive and time-consuming process. Because the measurement of \(whole\) \(weight\) is easy and highly correlated with \(viscera\) \(weight\) (the correlation coefficient is \(0.966\)), we used \(whole\) \(weight\) as the concomitant variable to obtain a sample of size \(25\) in RSS method. We have the following results for \(viscera\) \(weight\).

```
cor(abaloneData\$viscera.weight, abaloneData\$whole.weight)
1] 0.9663751 [
```

```
set.seed(50)
= con.rss(abaloneData\$viscera.weight, abaloneData\$whole.weight, m = 5, r = 5,
sampleRSS sets = TRUE, concomitant = FALSE)\$sample.x
\
meanRSS(sampleRSS, m = 5, r = 5, alpha = 0.05, alternative = "two.sided", mu_0 = 0.18)
$mean
\1] 0.17826
[
$CI
\1] 0.1293705 0.2271495
[
$z.test
\1] -0.06975604
[
$p.value
\1] 0.9443878
[
varRSS(sampleRSS, m = 5, r = 5, type = "Stokes")
1] 0.0135364 [
```

The results from our sample data indicate that the estimated mean and
the variance are \(0.17826\) and \(0.01354\), respectively. According to the
hypothesis testing result, we conclude that there is no strong evidence
against the null hypothesis (`p.value`

\(>0.05\)).

RSS is an efficient data collection method compared to SRS especially in situations where the measurement of a unit is expensive but the ranking is less costly. In this study, we propose a package which obtains sample from RSS and its modifications and provide functions to allow some inferential procedures by RSS. We create a set of functions for sampling under both perfect and imperfect rankings with a concomitant variable. For the inferential procedures, we consider mean, variance, and regression estimator and sign, Mann-Whitney-Wilcoxon, and Wilcoxon signed-rank tests for the distribution free tests. Proposed functions in the package are illustrated with the examples and analysis of a real data is given. Future improvements of the package may be provided by adding new inference procedures based on RSS methods.

The authors thank two anonymous referees and the associate editor for their helpful comments and suggestions which improved the presentation of the paper. This study is supported by the Scientific and Technological Research Council of Turkey (TUBITAK-COST Grant No. 115F300) under ISCH COST Action IS1304.

NSM3, RSSampling, stats, LearnBayes

Bayesian, Distributions, Survival, TeachingStatistics

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

J. Albert. *LearnBayes: Functions for learning bayesian inference.* 2018. URL https://cran.r-project.org/web/packages/LearnBayes/index.html.

A. D. Al-Nasser. L ranked set sampling: A generalization procedure for robust visual sampling. *Communications in Statistics—Simulation and Computation*, 36(1): 33–43, 2007. URL https://doi.org/10.1080/03610910601096510.

A. D. Al-Nasser and A. B. Mustafa. Robust extreme ranked set sampling. *Journal of Statistical Computation and Simulation*, 79(7): 859–867, 2009. URL https://doi.org/10.1080/00949650701683084.

A. I. Al-Omari and C. N. Bouza. Review of ranked set sampling: Modifications and applications. *Revista Investigación Operacional*, 3(35): 215–240, 2014.

A. I. Al-Omari and M. Z. Raqab. Estimation of the population mean and median using truncation-based ranked set samples. *Journal of Statistical Computation and Simulation*, 83(8): 1453–1471, 2013. URL https://doi.org/10.1080/00949655.2012.662684.

M. F. Al-Saleh and M. A. Al-Kadiri. Double-ranked set sampling. *Statistics and Probability Letters*, 48(2): 205–212, 2000. URL https://doi.org/10.1016/S0167-7152(99)00206-0.

L. L. Bohn and D. A. Wolfe. Nonparametric two sample procedures for ranked-set sampling data. *Journal of the American Statistical Association*, 87: 552–561, 1992. URL https://doi.org/10.1080/01621459.1992.10475239.

L. L. Bohn and D. A. Wolfe. The effect of imperfect judgment rankings on properties of procedures based on the ranked-set samples analog of the Mann-Whitney-Wilcoxon statistic. *Journal of the American Statistical Association*, 89: 168–176, 1994. URL https://doi.org/10.1080/01621459.1994.10476458.

B. Cetintav, G. Ulutagay, S. Gurler and N. Demirel. Mean estimation based on FWA using ranked set sampling with single and multiple rankers. In *International conference on information processing and management of uncertainty in knowledge-based systems*, pages. 790–797 2016. Springer. URL https://doi.org/10.1007/978-3-319-40581-0_64.

Z. Chen. On ranked-set sample quantiles and their applications. *Journal of Statistical Planning and Inference*, 83: 125–135, 2000. URL https://doi.org/10.1016/S0378-3758(99)00071-3.

Z. Chen. Ranked-set sampling with regression-type estimators. *Journal of Statistical Planning and Inference*, 92: 181–192, 2001. URL https://doi.org/10.1016/S0378-3758(00)00140-3.

Z. Chen, Z. Bai and B. Sinha. *Ranked set sampling: Theory and applications.* Springer-Verlag, 2003. URL https://doi.org/10.1007/978-0-387-21664-5.

H. A. David and D. N. Levine. Ranked set sampling in the presence of judgment error. *Biometrics*, 28: 553–555, 1972.

T. R. Dell and J. L. Clutter. Ranked set sampling theory with order statistics background. *Biometrika*, 28: 545–555, 1972. URL https://doi.org/10.2307/2556166.

U. S. E. P. A. EPA. *Guidance for choosing a sampling design for environmental data collection for use in developing a quality assurance project plan (EPA QA/g-5S).* Office of Environmental Information, Washington, DC, 2012.

L. S. Halls and T. R. Dell. Trial of ranked set sampling for forage yields. *Forest Science*, 12: 22–26, 1966. URL https://doi.org/10.1093/forestscience/12.1.22.

A. Haq, J. Brown, E. Moltchanova and A. I. Al-Omari. Partial ranked set sampling design. *Environmetrics*, 24(3): 201–207, 2013. URL https://doi.org/10.1002/env.2203 .

T. P. Hettmansperger. The ranked set sample sign test. *Journal of Nonparametric Statistics*, 4: 263–270, 1995. URL https://doi.org/10.1080/10485259508832617.

A. A. Jemain, A. I. Al-Omari and K. Ibrahim. Some variations of ranked set sampling. *Electronic Journal of Applied Statistical Analysis*, 1: 1–15, 2008. URL https://doi.org/10.1285/i20705948v1n1p1.

A. Jemain and A. Al-Omari. Double percentile ranked set samples for estimating the population mean. *Advances and Applications in Statistics*, 6(3): 261–276, 2006.

A. Kaur, G. Patil, A. Sinha and C. Taillie. Ranked set sampling: An annotated bibliography. *Environmental and Ecological Statistics*, 2(1): 25–54, 1995. URL https://doi.org/10.1007/BF00452930.

P. H. Kvam and F. J. Samaniego. On the inadmissibility of empirical averages as estimators in ranked set sampling. *Journal of Statistical Planning and Inference*, 36: 39–55, 1993. URL https://doi.org/10.1016/0378-3758(93)90100-K.

G. A. McIntyre. A method of unbiased selective sampling using ranked sets. *Australian Journal of Agricultural Research*, 3: 385–390, 1952. URL https://doi.org/10.1071/AR9520385.

H. A. Muttlak. Investigating the use of quartile ranked set samples for estimating the population mean. *Applied Mathematics and Computation*, 146(2): 437–443, 2003a. URL https://doi.org/10.1016/S0096-3003(02)00595-7.

H. A. Muttlak. Median ranked set sampling. *J. Appl. Statist. Sci*, 6: 245–255, 1997.

H. A. Muttlak. Modified ranked set sampling methods. *Pakistan Journal of Statistics*, 19(3): 315–323, 2003b.

W. J. Nash, T. L. Sellers, S. R. Talbot, A. J. Cawthorn and W. B. Ford. The population biology of abalone (haliotis species) in tasmania. I. Blacklip abalone (h. Rubra) from the north coast and islands of bass strait. *Sea Fisheries Division, Technical Report*, (48): 1994.

O. Ozturk. Ratio estimators based on a ranked set sample in a finite population setting. *Journal of the Korean Statistical Society*, 47(2): 226–238, 2018. URL https://doi.org/10.1016/j.jkss.2018.02.001.

H. M. Samawi. On double extreme rank set sample with application to regression estimator. *Metron International Journal of Statistics*, 60: 50–63, 2002.

H. M. Samawi, W. Abu Dayyeh and M. S. Ahmed. Estimating the population mean using extreme ranked set sampling. *Biometrical Journal*, 38: 577–568, 1996. URL https://doi.org/10.1002/bimj.4710380506.

H. M. Samawi and E. M. Tawalbeh. Double median ranked set sample: Comparing to other double ranked samples for mean and ratio estimators. *Journal of Modern Applied Statistical Methods*, 1(2): 52, 2002. URL https://doi.org/10.22237/jmasm/1036109460.

G. Schneider. *NSM3: Functions and datasets to accompany Hollander, Wolfe, and Chicken–Nonparametric statistical methods.* 2015. URL https://CRAN.R-project.org/package=NSM3.

B. Sevinç, S. Gürler and B. Çetintav. Partial groups ranked set sampling and mean estimation. *Journal of Statistical Computation and Simulation*, 1–12, 2018. URL https://doi.org/10.1080/00949655.2018.1488255.

S. L. Stokes. Estimation of variance using judgment ordered ranked set samples. *Biometrics*, 36: 35–42, 1980a. URL https://doi.org/10.2307/2530493.

S. L. Stokes. Inferences on the correlation coefficient in bivariate normal populations from ranked set samples. *Journal of the American Statistical Association*, 75: 989–995, 1980b. URL https://doi.org/10.1080/01621459.1980.10477584.

S. L. Stokes. Ranked set sampling with concomitant variables. *Communications in Statistics - Theory and Methods*, 12: 1207–1211, 1977. URL https://doi.org/10.1080/03610927708827563.

S. L. Stokes and T. W. Sager. Characterization of a ranked set sample with application to estimating distribution functions. *Journal of the American Statistical Association*, 83: 374–381, 1988. URL https://doi.org/10.1080/01621459.1988.10478607.

K. Takahasi and K. Wakimoto. On unbiased estimates of the population mean based on the sample stratified by means of ordering. *Annals of the Institute of Statistical Mathematics*, 20:1: 1–31, 1968. URL https://doi.org/10.1007/BF02911622.

M. Tiensuwan and S. Sarikavanij. On estimation of population variance based on a ranked set sample. *J Appl Stat Sci*, 12: 283–295, 2003.

D. A. Wolfe. Ranked set sampling: Its relevance and impact on statistical inference. *ISRN Probability and Statistics*, 2012: 1–32, 2012. URL https://doi.org/10.5402/2012/568385.

P. L. H. Yu and K. Lam. Regression estimator in ranked set sampling. *Biometrics*, 1070–1080, 1997. URL https://doi.org/10.2307/2533564.

E. Zamanzade and M. Vock. Variance estimation in ranked set sampling using a concomitant variable. *Statistics & Probability Letters*, 105: 1–5, 2015. URL https://doi.org/10.1016/j.spl.2015.04.034.

Z. Zhang, T. Liu and B. Zhang. Jackknife empirical likelihood inferences for the population mean with ranked set samples. *Statistics & Probability Letters*, 108: 16–22, 2016. URL https://doi.org/10.1016/j.spl.2015.09.016.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Sevinc, et al., "RSSampling: A Pioneering Package for Ranked Set Sampling ", The R Journal, 2019

BibTeX citation

@article{RJ-2019-039, author = {Sevinc, Busra and Cetintav, Bekir and Esemen, Melek and Gurler, Selma}, title = {RSSampling: A Pioneering Package for Ranked Set Sampling }, journal = {The R Journal}, year = {2019}, note = {https://rjournal.github.io/}, volume = {11}, issue = {1}, issn = {2073-4859}, pages = {401-415} }