ClusTorus: An R Package for Prediction and Clustering on the Torus by Conformal Prediction

Protein structure data consist of several dihedral angles, lying on a multidimensional torus. Analyzing such data has been and continues to be key in understanding functional properties of proteins. However, most of the existing statistical methods assume that data are on Euclidean spaces, and thus they are improper to deal with angular data. In this paper, we introduce the package ClusTorus specialized to analyzing multivariate angular data. The package collects some tools and routines to perform algorithmic clustering and model-based clustering for data on the torus. In particular, the package enables the construction of conformal prediction sets and predictive clustering, based on kernel density estimates and mixture model estimates. A novel hyperparameter selection strategy for predictive clustering is also implemented, with improved stability and computational efficiency. We demonstrate the use of the package in clustering protein dihedral angles from two real data sets.

Seungki Hong (Department of Statistics, Seoul National University) , Sungkyu Jung (Department of Statistics, Seoul National University)
2022-10-11

1 Introduction

Multivariate angular or circular data have found applications in some research domains including geology (e.g., paleomagnetic directions) and bioinformatics (e.g., protein dihedral angles). Due to the cyclic nature of angles, usual vector-based statistical methods are not directly applicable to such data. A \(p\)-variate angle \(\theta = (\theta_1,\cdots,\theta_p)^T\) lies on the \(p\)-dimensional torus \(\mathbb{T}^p = [0,2\pi)^p\) in which the angles \(0\) and \(2\pi\) are identified as the same point. Likewise, angles \(\theta\) and \(\theta\pm 2\pi\) are the same data point on the torus. Thus, statistical models and predictions on the torus should reflect this geometric constraint.

A prominent example in which multivariate angular data appear is the analysis of protein structures. As described in Branden and Tooze (1999), the functional properties of proteins are determined by the ordered sequences of amino acids and their spatial structures. These structures are determined by several dihedral angles, and thus, protein structures are commonly described on multidimensional tori. The \(p\)-dimensional torus \(\mathbb{T}^p\) is the sample space we consider in this paper. Especially, for the 2-dimensional case, the backbone chain angles \(\phi,\psi\) of a protein are commonly visualized by the Ramachandran plot, a scatter plot of dihedral angles in a 2-dimensional flattened torus \(\mathbb{T}^2\) (Lovell et al. 2003; Oberholser 2010). In Figure 1, several clustering results are visualized on the Ramachandran plot for the protein angles of SARS-CoV-2 virus, which caused the 2020-2021 pandemic (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. et al. 2020). Since the structures in protein angles are related to functions of the protein, it is of interest to analyze the scatter of the angles through, for example, density estimation and clustering. Note that the protein structure data are routinely collected and publicly available at Protein Data Bank (Berman et al. 2003) and importing such data into R is made easy by the package bio3d (Grant et al. 2006; Grant et al. 2021).

graphic without alt text
Figure 1: Several clustering results on Ramachandran plot for SARS-CoV-2 by using clus.torus (top left) and kmeans.torus (top right), both implemented in ClusTorus, mixtools::mvnormalmixEM (bottom left), in which the number of components 3 is prespecified, and mclust::Mclust (bottom right), in which the number of components is chosen by BIC. Gray points in the top-left panel are “outliers", automatically assigned by clus.torus.

We introduce the R package ClusTorus (Jung and Hong 2021) which provides various tools for handling and clustering multivariate angular data on the torus. The package provides angular adaptations of usual clustering methods such as the \(k\)-means clustering and pairwise angular distances, which can be used as an input for distance-based clustering algorithms, and implements a novel clustering method based on conformal prediction framework (Vovk et al. 2005). Also implemented in the package are the EM algorithms and an elliptical \(k\)-means algorithm for fitting mixture models on the torus, and a kernel density estimation. We will introduce various clustering tools implemented in the package, explaining choices in conformal prediction using two sets of example data. We also present the theoretical and technical background, and demonstrate these tools with R codes.

For data on the torus, there are a few previous works for mixture modeling and clustering. Mardia et al. (2007) proposed a mixture of bivariate von Mises distributions for data on \(\mathbb{T}^2\), with an application to modeling protein backbone chain angles. Mardia et al. (2012) proposed a density estimation on the torus, based on a mixture of approximated von Mises sine distributions, for higher dimensional cases, but the proposed EM algorithm tends to be unstable when sample sizes are limited. The R package BAMBI (Chakraborty and Wong 2019, 2020) provides routines to fit such von Mises mixture models using MCMC, but is only applicable to bivariate (and univariate) angles in \(\mathbb{T}^2\). We have implemented EM algorithms (for \(p=2\)) and the elliptical \(k\)-means algorithm (for any \(p\)), originally proposed for vector-valued data (Sung and Poggio 1998; Bishop 2006; Shin et al. 2019), for fitting mixture models on the torus. To the best of authors’ knowledge, ClusTorus is the first implementation of methods for fitting mixture models on multidimensional tori of any dimension.

Algorithmic clustering for data on the torus has also been proposed. For example, Gao et al. (2018) used an extrinsic \(k\)-means algorithm for clustering protein angles. While this algorithm is not always satisfactory, it is implemented in ClusTorus for a quick-and-dirty analysis. The top right panel of Figure 1 depicts the result of applying this algorithm with \(k = 3\). Note that the popular R packages mixtools (Benaglia et al. 2009) and mclust (Scrucca et al. 2016) provide misleading clustering results, when applied to data on the torus. As we illustrate in Figure 1, these tools do not take into account the cyclic nature of the angular data.

The main contribution of ClusTorus is an implementation of the predictive clustering approaches of Jung et al. (2021) and Shin et al. (2019). For this, the conformal prediction framework of Vovk et al. (2005) is extended for multivariate angular data. The conformal prediction is a distribution-free method of constructing prediction sets, and our implementation uses kernel density estimates and mixture models, both based on the multivariate von Mises distribution (Mardia et al. 2012). Furthermore, by using Gaussian-like approximations of the von Mises distributions and a graph-theoretic approach, flexible clusters, composed of unions of ellipsoids on \(\mathbb{T}^p\), can be identified. The proposed predictive clustering can be obtained by simply using clus.torus as follows.

library(ClusTorus)
set.seed(2021)
ex <- clus.torus(SARS_CoV_2)
plot(ex)

The result of the predictive clustering is visualized in the top left panel of Figure 1, which is generated by plot(ex). The dataset SARS_CoV_2, included in ClusTorus, collects the dihedral angles \(\phi,\psi\) in the backbone chain B of SARS-CoV-2 spike glycoprotein. The raw coronavirus protein data are available at Protein Data Back with id 6VXX (Walls et al. 2020), and can be retrieved by using R package bio3d. The function clus.torus performs three core procedures—conformal prediction, hyperparameter selection and cluster assignment—for predictive clustering.

The rest of this article focuses on introducing the three core procedures: (i) the conformal prediction framework, including our choices of the conformity scores, (ii) hyperparameter selection and (iii) cluster assignment. After demonstrating how the package ClusTorus can be used for clustering of \(\mathbb{T}^2\)- and \(\mathbb{T}^4\)-valued data, we describe how the main function clus.torus and other clustering algorithms such as \(k\)-means and hierarchical clustering can be used to analyze data on the torus. In the Appendix, we provide technical details and options in fitting mixture models on the torus, and a list of S3 classes defined in ClusTorus.

2 Conformal prediction

The conformal prediction framework (Vovk et al. 2005) is one of the main ingredients of our development. Based on the work of Vovk et al. (2005) and Lei et al. (2013, 2015), we briefly introduce the basic concepts and properties of conformal prediction. Suppose that we observe a sample of size \(n\), \(X_i\sim F\) where \(X_i\in \mathbb{T}^p\) for each \(i\) and that the sequence \(\mathbb{X}_n =\left\{X_1,\cdots,X_n\right\}\) is exchangeable. Then, for a new \(X_{n+1}\sim F\), the prediction set \(C_n = C_n\left(\mathbb{X}_n\right)\) is said to be valid at level \(1-\alpha\) if: \[\begin{aligned} \label{eq:1} P\left(X_{n+1}\in C_n\right)\ge 1-\alpha,\quad \alpha\in \left(0,1\right), \end{aligned} \tag{1}\] where \(P\) is the corresponding probability measure for \(\mathbb{X}_{n+1}=\mathbb{X}_n\cup\left\{X_{n+1}\right\}\).

For a given \(x\in \mathbb{T}^p\), write \(\mathbb{X}_{n}(x)=\mathbb{X}_n\cup\left\{x\right\}\). Consider the null hypothesis \(H_0: X_{n+1}=x\), where \(X_{n+1}\sim F\). To test the hypothesis, the conformal prediction framework uses conformity scores \(\sigma_i\) defined as follows: \[\begin{aligned} \sigma_i\left(x\right) &:= g\left(X_i, \mathbb{X}_{n}\left(x\right)\right),~ \forall i=1,\cdots,n+1,\\ \sigma\left(x\right) &:=g\left(x, \mathbb{X}_{n}\left(x\right)\right) = \sigma_{n+1}\left(x\right), \end{aligned}\] for some real valued function \(g\), which measures the conformity or similarity of a point to the given set. If \(X_{\left(1\right)},\cdots,X_{\left(n+1\right)}\) are ordered to satisfy \(\sigma_{\left(1\right)}\le\cdots\le\sigma_{\left(n+1\right)}\) for \(\sigma_{\left(i\right)} = g\left(X_{\left(i\right)}, \mathbb{X}_{n+1}\right)\), then we may say that \(X_{\left(n+1\right)}\) is the most similar point to \(\mathbb{X}_{n+1}\).

Consider the following quantity: \[\begin{aligned} \pi\left(x\right) = \frac{1}{n+1}\sum_{i=1}^{n+1}I\left(\sigma_i\left(x\right)\le\sigma_{n+1}\left(x\right)\right),\quad I(A) = \begin{cases} 1,\quad A\text{ is true,}\\ 0,\quad \text{otherwise,} \end{cases} \end{aligned}\] which can be understood as a p-value for the null hypothesis \(H_0\). The conformal prediction set of level \(1-\alpha\) is constructed as \[\begin{aligned} C^{\alpha}_n=\left\{x:\pi\left(x\right)>\alpha\right\}. \end{aligned}\] Because the sequence \(\mathbb{X}_n(x)\) is exchangeable under \(H_0\), \(\pi\left(x\right)\) is uniformly distributed on \(\left\{\frac{1}{n+1},\cdots,1\right\}\). With this property, it can be shown that the conformal prediction set is valid for finite samples, i.e., (1) holds with \(C_n\) replaced by \(C_n^{\alpha}\) for any \(F\), that is, the prediction set is distribution-free (Lei et al. 2013). The performance of the conformal prediction highly depends on the choice of conformity score \(\sigma\). In some previous works on conformal prediction (Lei et al. 2013, 2015; Shin et al. 2019; Jung et al. 2021), the quality of prediction sets using density based conformity scores has been satisfactory.

We demonstrate a construction of the conformal prediction set with a kernel density estimate-based conformity score, defined later in (2), for the data shown in Figure 1. With the conformity score given by (2), cp.torus.kde computes the conformal prediction set \(C_n^\alpha\) at a given level \(1-\alpha\) (\(\alpha=0.1\) below), by performing the kernel density estimation. The function tests the inclusion in \(C_n^\alpha\) of each point \((\phi,\psi)\) over a fine grid of \(\mathbb{T}^2\), and the result of the testing is shown as the boolean indices in the column Cn of the output below. The columns Lminus and Lplus provide approximated prediction sets, defined in Jung et al. (2021).

cp.kde <- cp.torus.kde(SARS_CoV_2)
cp.kde

Conformal prediction sets (Lminus, Cn, Lplus) based on kde with concentration 25

Testing inclusion to the conformal prediction set with level = 0.1 :
-------------
          phi psi Lminus    Cn Lplus level
1  0.00000000   0  FALSE FALSE FALSE   0.1
2  0.06346652   0  FALSE FALSE FALSE   0.1
3  0.12693304   0  FALSE FALSE FALSE   0.1
4  0.19039955   0  FALSE FALSE FALSE   0.1
5  0.25386607   0  FALSE FALSE FALSE   0.1
6  0.31733259   0  FALSE FALSE FALSE   0.1
7  0.38079911   0  FALSE FALSE FALSE   0.1
8  0.44426563   0  FALSE FALSE FALSE   0.1
9  0.50773215   0  FALSE FALSE FALSE   0.1
10 0.57119866   0  FALSE FALSE FALSE   0.1

 9990 rows are omitted.

The concentration parameter \(\kappa\) of the kernel density estimation and the level(s) of the prediction set can be designated by providing arguments concentration and level. By default, these values are set as concentration = 25 and level = 0.1.

The output cp.kde is an S3 object with class cp.torus.kde, for which a generic method plot is available. The conformal prediction set for SARS_CoV_2 data can be displayed on the Ramachandran plot, as follows. The result is shown in Figure 2.

plot(cp.kde)
graphic without alt text
Figure 2: The Ramachandran plot for SARS-CoV-2, with boundaries of the conformal prediction set, whose conformity score is (2) with \(\kappa = 25\) for level \(1-\alpha = 0.9\).

Inductive conformal prediction

If the sample size \(n\) and the number \(N\) of grid points over \(\mathbb{T}^p\) are large, evaluating \(n + N\) conformity scores may take a long time. That is, constructing the conformal prediction set suffers from high computational costs. A workaround for this inefficiency is inductive conformal prediction, which enjoys significantly lower computational cost. The inductive conformal prediction framework is based on splitting the data into two sets. The algorithm for inductive conformal prediction is given in Algorithm 1.

  1. procedure inductive conformal prediction(\(\left\{X_1,\cdots,X_n\right\}, \alpha, n_1 < n\))

  2. Split the data randomly into \(\mathbb{X}_1=\left\{X_1,\cdots, X_{n_1}\right\}\), \(\mathbb{X}_2=\left\{X_{n_1+1},\cdots, X_{n}\right\}\).

  3. Construct \(\sigma\) with \(\sigma\left(x\right) = g\left(x,\mathbb{X}_1\right)\) for some function \(g\).

  4. Put \(\sigma_i = g\left(X_{n_1+i},\mathbb{X}_1\right)\) and order as \(\sigma_{\left(1\right)}\le\cdots\le\sigma_{\left(n_2\right)}\), where \(n_2=n-n_1\).

  5. Construct \(\hat{C}^{\alpha}_n = \left\{x:\sigma(x)\ge\sigma_{\left(i_{n_2,\alpha}\right)}\right\}\) where \(i_{n,\alpha} = \lfloor\left(n+1\right)\alpha\rfloor\).

  6. end procedure

Algorithm 1: Inductive Conformal Prediction

While the sizes \(n_1\) and \(n_2\) of two split data sets can be of any size, they are typically set as equal sizes. It is well-known that the output \(\hat{C}^\alpha_n\) of the algorithm also satisfies the distribution-free finite-sample validity (Vovk et al. 2005; Lei et al. 2015). For fast computation, the inductive conformal prediction is primarily used in constructing prediction sets and clustering, in our implementation of ClusTorus. Specifically, icp.torus implements Algorithm 1 for several prespecified conformity scores. As already mentioned, we need to choose the conformity score \(\sigma\) carefully for better clustering performances.

Before we discuss our choices of the conformity scores, we first illustrate how the functions in ClusTorus are used to produce inductive conformal prediction sets. The following codes show a calculation of the inductive conformal prediction set for the data SARS_CoV_2. The conformal prediction set with the conformity score given by kernel density estimates (2) can be constructed by icp.torus and icp.torus.eval. The function icp.torus computes \(\sigma_i\)’s in line 4 of Algorithm 1 and icp.torus.eval tests whether pre-specified evaluation points are included in \(\hat{C}_n^\alpha\). If these evaluation points are not supplied, then icp.torus.eval creates a grid of size \(100 \times 100\) (for \(p = 2\)).

set.seed(2021)
icp.torus.kde <- icp.torus(SARS_CoV_2, model = "kde", concentration = 25)
icp.kde <- icp.torus.eval(icp.torus.kde, level = 0.1)
icp.kde

Conformal prediction set (Chat_kde)

Testing inclusion to the conformal prediction set with level = 0.1:
-------------
           X1 X2 inclusion
1  0.00000000  0     FALSE
2  0.06346652  0     FALSE
3  0.12693304  0     FALSE
4  0.19039955  0     FALSE
5  0.25386607  0     FALSE
6  0.31733259  0     FALSE
7  0.38079911  0     FALSE
8  0.44426563  0     FALSE
9  0.50773215  0     FALSE
10 0.57119866  0     FALSE

 9990 rows are omitted.

In the codes above, the data splitting for icp.torus is done internally, and can be inspected by icp.torus.kde$split.id.

We now introduce our choices for the conformity score \(\sigma\) in the next two subsections.

Conformity score from kernel density estimates

For the 2-dimensional case, Jung et al. (2021) proposed to use the kernel density estimate based on the von Mises kernel (Marzio et al. 2011) for the conformity score. A natural extension to the \(p\)-dimensional tori, for \(p\ge 2\), is \[\begin{aligned} \label{eq:6} g\left(u, \mathbb{X}_{n}\left(x\right)\right) = \frac{1}{n+1}\sum_{i=1}^{n+1}K_\kappa\left(u-X_i\right),\quad K_\kappa\left(v\right) = \prod_{i=1}^p\frac{e^{\kappa\cos\left(v_i\right)}}{2\pi I_0\left(\kappa\right)},\quad v = \left(v_1,\cdots,v_p\right)^T\in\mathbb{T}^p \end{aligned} \tag{2}\] where \(I_0\) is the modified Bessel function of the first kind of order 0, and \(\kappa\) is a prespecified concentration parameter. The function kde.torus provides the multivariate von Mises kernel density estimation. For conformal prediction, we take \(\sigma\left(x_i\right) = g\left(x_i,\mathbb{X}_n\left(x\right)\right)\), and for inductive conformal prediction, we take \(\sigma\left(x\right) = g\left(x,\mathbb{X}_1\right)\).

Conformity scores from mixtures of multivariate von Mises

Our next choices of conformity scores are based on mixture models. Since the multivariate normal distributions are not defined on \(\mathbb{T}^p\), we instead use the multivariate von Mises distribution (Mardia et al. 2008), whose density on \(\mathbb{T}^p\) is \[\begin{aligned} \label{eq:7} f\left(y;\mu,\kappa, \Lambda\right) = C\left(\kappa,\Lambda\right)\exp\left\{-\frac{1}{2}\left[\kappa^T\left(2 - 2c\left(y,\mu\right)\right)+s\left(y,\mu\right)^T\Lambda s\left(y,\mu\right)\right]\right\} \end{aligned} \tag{3}\] where \(y = \left(y_1,\cdots,y_p\right)^T \in\mathbb{T}^p\), \(\mu = \left(\mu_1,\cdots,\mu_p\right)^T \in\mathbb{T}^p\), \({\kappa} = (\kappa_1,\ldots,\kappa_p)^T \in (0,\infty)^p\), \(\Lambda = (\lambda_{j,l})\) for \(1\le j,l \le p\), \(-\infty < \lambda_{jl} < \infty\), \[\begin{aligned} c\left(y,\mu\right) &= \left(\cos\left(y_1-\mu_1\right),\cdots, \cos\left(y_p-\mu_p\right)\right)^T,\\ s\left(y,\mu\right) &= \left(\sin\left(y_1-\mu_1\right),\cdots, \sin\left(y_p-\mu_p\right)\right)^T,\\ \left(\Lambda\right)_{jl}&=\lambda_{jl}=\lambda_{lj},~j\ne l,\quad \left(\Lambda\right)_{jj}=\lambda_{jj}=0, \end{aligned}\] and for some normalizing constant \(C\left(\kappa,\Lambda\right)>0\). We write \(f\left(y;\theta\right) = f\left(y;\mu,\kappa,\Lambda\right)\) for \(\theta = (\mu,\kappa,\Lambda)\).

For any positive integer \(J\) and a mixing probability \(\boldsymbol{\pi} = \left(\pi_1,\cdots,\pi_J\right)\), consider a \(J\)-mixture model: \[\begin{aligned} \label{eq:8} p\left(u;\boldsymbol{\pi},\boldsymbol{\theta}\right) = \sum_{j=1}^J\pi_jf\left(u;\theta_j\right) \end{aligned} \tag{4}\] where \(\boldsymbol{\theta} = \left(\theta_1,\cdots,\theta_J\right)\), \(\theta_j = (\mu_{j}, \kappa_{j}, \Lambda_j)\) for \(j=1,\cdots,J\). Let \(\left(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\theta}}\right)\) be appropriate estimators of \(\left(\boldsymbol{\pi}, \boldsymbol{\theta}\right)\) based on \(\mathbb{X}_1\). The plug-in density estimate based on (4) is then \[\begin{aligned} \label{eq:9} p\left(\cdot;\hat{\boldsymbol{\pi}},\hat{\boldsymbol{\theta}}\right) = \sum_{j=1}^J\hat{\pi}_jf\left(\cdot;\hat{\theta}_j\right), \end{aligned} \tag{5}\] which can be used as a conformity score by setting \(g\left(\cdot,\mathbb{X}_1\right)=\hat{p}\left(\cdot\right)\). Assuming high concentrations, an alternative conformity score can be set as \(g\left(\cdot,\mathbb{X}_1\right)=p^{max}\left(\cdot,\hat{\boldsymbol{\pi}},\hat{\boldsymbol{\theta}}\right)\) where \[\begin{aligned} \label{eq:10} p^{max}\left(u;\hat{\boldsymbol{\pi}},\hat{\boldsymbol{\theta}}\right):=\max_{j=1,\cdots,J}\left(\hat{\pi}_jf\left(u;\hat{\theta}_j\right)\right)\approx p\left(u;\hat{\boldsymbol{\pi}},\hat{\boldsymbol{\theta}}\right). \end{aligned} \tag{6}\]

On the other hand, Mardia et al. (2012) introduced an approximated density function \(f^*\) for the \(p\)-variate von Mises sine distribution (3) for sufficiently high concentrations and when \(\Sigma\succ0\): \[\begin{aligned} \notag f^*\left(y;,\mu,\Sigma\right) = \left(2\pi\right)^{-p/2}\left|\Sigma\right|^{-1/2}\exp\left\{-\frac{1}{2}\left[\kappa^T\left(2 - 2c\left(y,\mu\right)\right)+s\left(y,\mu\right)^T\Lambda s\left(y,\mu\right)\right]\right\} \end{aligned}\] where \(\left(\Sigma^{-1}\right)_{jl} = \lambda_{jl}, \left(\Sigma^{-1}\right)_{jj} = \kappa_j, ~j\ne l\). By further approximating via \(\theta \approx \sin\theta\), \(1-\frac{\theta^2}{2}\approx\cos\theta\), we write \[\begin{aligned} f^*\left(y;,\mu,\Sigma\right) \approx \left(2\pi\right)^{-p/2}\left|\Sigma\right|^{-1/2}\exp\left\{-\frac{1}{2}\left[\left(y\ominus\mu\right)^T\Sigma^{-1}\left(y\ominus\mu\right)\right]\right\},\label{eq:11} \end{aligned} \tag{7}\] where the angular subtraction \(\ominus\) stands for \[\begin{aligned} \notag X\ominus Y := \left(\arg\left(e^{i\left(\phi_{x1} - \phi_{y1}\right)}\right), \cdots, \arg\left(e^{i\left(\phi_{xp} - \phi_{yp}\right)}\right)\right)^T, \end{aligned}\] for \(X = \left(\phi_{x1},\cdots,\phi_{xp}\right)^T\in\mathbb{T}^p\) and \(Y = \left(\phi_{y1},\cdots,\phi_{yp}\right)^T\in\mathbb{T}^p\) as defined in Jung et al. (2021) for \(p = 2\). By replacing the von Mises density \(f\) in (6) with the approximate normal density (7), \(\log\left(p^{max}\left(\cdot;\boldsymbol{\pi},\boldsymbol{\theta}\right)\right)\) is approximated by \[\begin{aligned} \log\left(p^{max}\left(u;\boldsymbol{\pi},\boldsymbol{\theta}\right)\right) &\approx \frac{1}{2}\max_j e\left(u;\pi_j,\theta_j\right) + c, \nonumber\\ e\left(u;\pi_j,\theta_j\right)&=-\left(u\ominus\mu_j\right)^T\Sigma_j^{-1}\left(u\ominus\mu_j\right) +2\log\pi_j-\log\left|\Sigma_j\right| \label{eq:ehatj} \end{aligned} \tag{8}\] where \(\theta_j=(\mu_{j},\Sigma_j)\), \(\mu_j = (\mu_{1j},\cdots,\mu_{pj})^T\in\mathbb{T}^p\), \(\Sigma_j\in\mathbb{R}^{p\times p}\) and a constant \(c\in\mathbb{R}\). Our last choice of the conformity score is \[\begin{aligned} \label{eq:12} g\left(\cdot,\mathbb{X}_{1}\right)=\max_je\left(\cdot,\hat{\pi}_j,\hat{\theta}_j\right). \end{aligned} \tag{9}\] Note that with this choice of conformity score, the conformal prediction set can be expressed as the union of ellipsoids on the torus. That is, the following equalities are satisfied (Shin et al. 2019; Jung et al. 2021): Let \(C_n^e\) be the level \(1-\alpha\) prediction set using (9). Then \[\begin{aligned} \notag C^e_n&:=\left\{x\in\mathbb{T}^p:g\left(x,\mathbb{X}_{1}\right)\ge g\left(X_{\left(i_{n_2,\alpha}\right)},\mathbb{X}_{1}\right)\right\}\\\label{eq:14} &=\bigcup_{j=1}^J\hat{E}_j\left(\sigma_{\left(i_{n_2,\alpha}\right)}\right) \end{aligned} \tag{10}\] where \(\hat{E}_j\left(t\right)=\left\{x\in\mathbb{T}^p:\left(x\ominus\hat{\mu}_j\right)^T\hat{\Sigma}_j^{-1}\left(x\ominus\hat{\mu}_j\right)\le 2\log\hat{\pi}_j-\log\left|\hat{\Sigma}_j\right| - t\right\}\) for \(t\in\mathbb{R}\). Note that \(\hat{E}_j\left(t\right)\) is automatically vanished if \(t\ge 2\log\hat{\pi}_j-\log\left|\hat{\Sigma}_j\right|\).

Implementation

We have implemented four conformity scores, described in the previous section. These are based on

  1. kernel density estimate (2),

  2. mixture model (5),

  3. max-mixture model (6), and

  4. ellipsoids obtained by approximating the max-mixture (9).

The function icp.torus in ClusTorus computes these conformity scores using the inductive conformal prediction framework, and returns icp.torus object(s). Table 1 illustrates several important arguments of the function icp.torus.

Table 1: Key arguments and descriptions for the function icp.torus
Arguments Descriptions
data \(n \times d\) matrix of toroidal data on \([0, 2\pi)^d\) or \([-\pi, \pi)^d\)
model A string. One of "kde", "mixture", and "kmeans" which determines the model or estimation methods. If "kde", the model is based on the kernel density estimates. It supports the kde-based conformity score only. If "mixture", the model is based on the von Mises mixture, fitted with an EM algorithm. It supports the von Mises mixture and its variants based conformity scores. If "kmeans", the model is also based on the von Mises mixture, but the parameter estimation is implemented with the elliptical k-means algorithm illustrated in Appendix. It supports the log-max-mixture based conformity score only. If the dimension of data space is greater than 2, only "kmeans" is supported. Default is model = "kmeans".
J A scalar or numeric vector for the number(s) of components for model = c("mixture", "kmeans"). Default is J = 4.
concentration A scalar or numeric vector for the concentration parameter(s) for model = "kde". Default is concentration = 25.

The argument model of the function icp.torus indicates which conformity score is used. By setting model = "kde", the kde-based conformity score (2) is used. By setting model = "mixture" the mixture model (5) is estimated by an EM algorithm, and conformity scores based on (5), (6), (9) are all provided. Setting model = "kmeans" provides a mixture model fit by the elliptical \(k\)-means algorithm and conformity score based on (9).

The arguments J and concentration specify the model fitting hyperparameters. To compute conformity scores based on kernel density estimate (2), one needs to specify the concentration parameter \(\kappa\). Likewise, the number of mixtures, \(J\), needs to be specified in order to fit the mixture model (5) and the variants (6) and (9). The function icp.torus takes either a single value (e.g., J = 4 is the default), or a vector (e.g., J = 4:30 or concentration = c(25,50)) for arguments J and concentration. If J (or concentration) is a scalar, then icp.torus returns an icp.torus object.

On the other hand, if J (or concentration) is a numeric vector containing at least two values, then icp.torus returns a list of icp.torus objects, one for each value in J (or concentration, respectively). Typically, the hyperparameter \(J\) (or \(\kappa\)) is not predetermined, and one needs to choose among a set of candidates. A list of icp.torus objects, evaluated for each candidate in vector-valued J (or concentration) is required for our hyperparameter selection procedure, discussed in a later section.

Let us present an R code example for creating an icp.torus object, fitted with model = "kmeans" (the default value for argument model) and J = 12.

set.seed(2021)
icp.torus.12 <- icp.torus(SARS_CoV_2, J = 12)
plot(icp.torus.12, level = 0.1111)