Dimension reduction is a technique that can compress given data and reduce noise. Recently, a dimension reduction technique on spheres, called spherical principal curves (SPC), has been proposed. SPC fits a curve that passes through the middle of data with a stationary property on spheres. In addition, a study of local principal geodesics (LPG) is considered to identify the complex structure of data. Through the description and implementation of various examples, this paper introduces an R package *spherepc* for dimension reduction of data lying on a sphere, including existing methods, SPC and LPG.

This paper aims to introduce an R package
*spherepc* that considers
several dimension reduction techniques on a sphere, which encompass
recently developed approaches such as SPC and LPG as well as some
existing methods, and discuss how to implement these methods through
*spherepc*.

Dimension reduction methods are widely used in various fields, including statistics and machine learning, by efficiently compressing data and removing noise (Benner et al. 2005). As one of the dimension reduction methods, the principal curves of Hastie and Stuetzle (1989) are suitable for fitting a curve or a surface of data in Euclidean space, which go through the middle of the data. Hauberg (2016) proposed an algorithm to find the principal curves in Riemannian manifolds based on the concept of the original principal curves. However, the principal curves proposed by Hauberg (2016) no longer represent the data continuously because of the approximation of the projection step required to fit the curves.

Recently, Lee et al. (2021a) proposed a new method, termed spherical principal
curves (SPC), that constructs principal curves, ensuring a stationary
property on spheres. SPC is useful for representing circular or waveform
data with smaller reconstruction errors than conventional methods,
including principal geodesic analysis (Fletcher et al. 2004), exact principal
circle (Lee et al. 2021a), and principal curves proposed by Hauberg (2016).
However, SPC has the disadvantage of being sensitive to initialization.
As a result, there are some data structures that SPC does not apply to,
for example, data with spirals, zigzags, or branches like tree-shape. A
localized version of SPC called local principal geodesics (LPG) is being
developed to resolve such a problem. A function for LPG is also provided
in the package
*spherepc*. Research on
the LPG is underway in progress.

To the best of our knowledge, no available R packages offer the methods
of dimension reduction and principal curves on a sphere. The existing R
packages providing principal curves, such as
*princurve* (Hastie and Weingessel 2015)
and *LPCM* (Einbeck et al. 2015), are
available only on Euclidean space, not on a sphere or (Riemannian)
manifold. In addition, most dimension reduction methods on manifolds
(Huckemann et al. 2010; Panaretos et al. 2014; Liu et al. 2017) involve
somewhat complex optimizations. The proposed package
*spherepc* for R provides
the state-of-the-art principal curve technique on the sphere (Lee et al. 2021a)
and comprehensively collects and implements the existing methods
(Fletcher et al. 2004; Hauberg 2016).

The rest of this paper is organized as follows. The following section
introduces the existing methods for dimension reduction on the sphere
and relevant functions covered in the package
*spherepc*, which is
available on CRAN. Furthermore, their usages are discussed with examples
in detail. Then, the spherical principal curves proposed by (Lee et al. 2021a)
and principal curves of (Hauberg 2016) are briefly described. In
addition, implementations of the `SPC()`

and `SPC.Hauberg()`

functions
in the *spherepc* are
presented. The subsequent section discusses the local principal
geodesics (LPG) with the implementation of various simulated data,
demonstrating its promising usability. In the application session, all
the mentioned methods are performed to analyze real seismological data.
Finally, conclusions are given in the last section.

Principal geodesic analysis (PGA) proposed by Fletcher et al. (2004) can be regarded as a generalization of principal component analysis (PCA) to Riemannian manifolds. In particular, Fletcher et al. (2004) performed dimension reduction of data on the Cartesian product space of the manifolds. In detail, the data are projected onto the tangent spaces at the intrinsic means of each component of the manifolds; thus, the given data are approximated as points on Euclidean vector space, and subsequently, PCA is applied to the points. As a result, the dimension reduction can be performed through the inverse of the tangent projections.

The principal geodesic analysis can be implemented by the `PGA()`

function available in the
*spherepc*. The detailed
usage of the `PGA()`

function is described as follows.

`PGA(data, col1 = "blue", col2 = "red") `

Before using the `PGA()`

function, it requires loading the packages
*rgl* (Adler and Murdoch 2020),
*sphereplot*
(Robotham 2013), and
*geosphere*
(Hijmans et al. 2017). The following codes yield an implementation of the
`PGA()`

function.

```
#### for all simulated datasets, longitude and latitude are expressed in degrees
#### example 1: half-great circle data
> circle <- GenerateCircle(c(150, 60), radius = pi/2, T = 1000)
> sigma <- 2 # noise level
> half.circle <- circle[circle[, 1] < 0, , drop = FALSE]
> half.circle <- half.circle + sigma * rnorm(nrow(half.circle))
> PGA(half.circle)
#### example 2: S-shaped data
# the dataset consists of two parts: lon ~ Uniform[0, 20],
# lat = sqrt(20 * lon - lon^2) + N(0, sigma^2),
# lon ~ Uniform[-20, 0], lat = -sqrt(-20 * lon - lon^2) + N(0, sigma^2)
> n <- 500
> sigma <- 1 # noise level
> lon <- 60 * runif(n)
> lat <- (60 * lon - lon^2)^(1/2) + sigma * rnorm(n)
> simul.S1 <- cbind(lon, lat)
> lon2 <- -60 * runif(n)
> lat2 <- -(-60 * lon2 - lon2^2)^(1/2) + sigma * rnorm(n)
> simul.S2 <- cbind(lon2, lat2)
> simul.S <- rbind(simul.S1, simul.S2)
> PGA(simul.S)
```

Because a principal geodesic is always a great circle, the `PGA()`

function is suitable for identifying the global data trend. The
implementations of half-circle and S-shaped data are displayed in
Figure 1, where the principal geodesic properly extracts the
global trends in the half-great circle and S-shaped data, while it
cannot identify the circular variations in the S-shaped case. In
addition, the arguments and outputs of the `PGA()`

function are
described in Tables 1 and 2.