Functional Singular Spectrum Analysis (FSSA) is a non-parametric approach for analyzing Functional Time Series (FTS) and Multivariate FTS (MFTS) data. This paper introduces Rfssa, an R package that addresses implementing FSSA for FTS and MFTS data types. Rfssa provides a flexible container, the funts class, for FTS/MFTS data observed on one-dimensional or multi-dimensional domains. It accepts arbitrary basis systems and offers powerful graphical tools for visualizing time-varying features and pattern changes. The package incorporates two forecasting algorithms for FTS data. Developed using object-oriented programming and Rcpp/RcppArmadillo, Rfssa ensures computational efficiency. The paper covers theoretical background, technical details, usage examples, and highlights potential applications of Rfssa.
In recent times, advancements in data acquisition techniques have made it possible to collect data in high-resolution formats. Due to the presence of temporal-spatial dependence, one may consider this type of data as functional data.
Functional Data Analysis (FDA) focuses on developing statistical methodologies for analyzing data represented as functions or curves. While FDA methods are particularly well-suited for handling smooth continuum data, they can also be adapted and extended to effectively analyze functional data that may not exhibit perfect smoothness, including high-resolution data and data with inherent variability.
The widely-used R package for FDA is fda (Ramsay et al. 2023), which is designed to support analysis of functional data, as described in the textbook by (Ramsay and Silverman 2005). Additionally, there are over 40 other R packages available on CRAN that incorporate functional data analysis, such as funFEM (Bouveyron 2021), fda.usc (Febrero-Bandle and Fuente 2012), refund (Goldsmith et al. 2023), fdapace (Gajardo et al. 2022), funData (Happ-Kurz 2020), ftsspec (Tavakoli 2015), rainbow (Shang and Hyndman 2022), and ftsa (Hyndman and Shang 2023).
One crucial initial requirement for any of these packages is to
establish a framework for representing and storing infinite-dimensional
functional observations. The
fda package, for instance,
employs the fd
class as a container for functional data defined on a
one-dimensional (1D) domain. An fd
object represents functional data
as a finite linear combination of known basis functions (e.g., Fourier,
B-splines, etc.), storing both the basis functions and their respective
coefficients for each curve. This representation aligns with the
practical implementation found in many papers within the field of FDA.
Conversely, several other R packages store functional
data in a discrete form evaluated on grid points (e.g.,
fda.usc,
refund,
funData,
rainbow, and
fdapace). These
packages also provide the capability to analyze functions beyond the
one-dimensional case, such as image data treated as two-dimensional (2D)
functions (e.g.,
refund,
fdasrvf, and
funData). To the best
of our knowledge, packages that support representation beyond 1D
functions utilize the grid point representation for execution and
storage.
Moreover, recent packages have been developed to handle multivariate functional data, which consist of more than one function per observation unit. Examples of such packages include roahd, fda.usc, and funData.
While some recent FDA packages have focused on analyzing and
implementing techniques for Functional Time Series (FTS), where
sequences of functions are observed over time, none of them handle
Multivariate FTS (MFTS) or multidimensional MFTS. For example, see the
packages ftsspec,
rainbow, and
ftsa. In summary, there
is still a need for a unified and flexible container for FTS/MFTS data,
defined on either one or multidimensional domains. The funts
class in
Rfssa (Haghbin et al. 2023),
the package discussed in this article, aims to address this gap. One of
the primary contributions of the package is its capacity to handle and
visualize 2-dimensional FTS, including image data. Furthermore, the
package accommodates MFTS, especially when observed on distinct domains.
This flexibility empowers users to analyze and visualize FTS with
multiple variables, even when they do not share the same domain.
Notably, the Rfssa
package introduces novel visualization tools (as exemplified in Figure
@ref(fig:call_center)). These tools include heatmaps and 3D plots,
thoughtfully designed to provide a deeper understanding of functional
patterns over time. They enhance the ability to discern trends and
variations that might remain inconspicuous in conventional plots. An
additional feature of the funts
class is its ability to accept any
arbitrary basis system as input for the class constructor, including FDA
basis functions or even empirical basis represented as matrices
evaluated at grid points. The classes in the
Rfssa package are
developed using the S3 object-oriented programming system, and for
computational efficiency, significant portions of the package are
implemented using the
Rcpp/RcppArmadillo
packages. Notably, the package includes a shiny web application that
provides a user-friendly GUI for implementing Functional Singular
Spectrum Analysis (FSSA) on real or simulated FTS/MFTS data.
The Rfssa package was initially developed to implement FSSA for FTS, as discussed in the work of (Haghbin et al. 2021). FSSA extends Singular Spectrum Analysis (SSA), a model-free procedure commonly used to analyze time series data. The primary goal of SSA is to decompose the original series into a collection of interpretable components, such as slowly varying trends, oscillatory patterns, and structureless noise. Notably, SSA does not rely on restrictive assumptions like stationarity, linearity, or normality (Golyandina and Zhigljavsky 2013).
It’s worth noting that SSA finds applications beyond the functional framework, including smoothing and forecasting purposes (Hassani and Mahmoudvand 2013; Carvalho and Rua 2017). The non-functional version of FSSA, known as SSA, has previously been implemented in the Rssa package (Golyandina et al. 2015) and the ASSA package (Carvalho and Martos 2020).
The Rssa package provides various visualization tools to facilitate the grouping stage, and the Rfssa package includes equivalent functional versions of those tools (Golyandina et al. 2018). While the foundational theory of FSSA was originally designed for univariate FTS, it has since been extended to handle multidimensional FTS data, referred to as Multivariate FSSA (MFSSA) (Trinka et al. 2022). Furthermore, in line with the developments in SSA for forecasting, two distinct algorithms known as Recurrent Forecasting (FSSA R-forecasting) and Vector Forecasting (FSSA V-forecasting) were introduced for FSSA by (Trinka et al. 2023). Both of these forecasting algorithms, along with the capabilities for handling MFSSA, have been seamlessly integrated into the most recent version of the Rfssa package.
The remainder of this manuscript is organized as follows. Section 2
introduces the FTS/MFTS data preparation theory used in the funts
class. Section 3 discusses the FSSA methodology, including the basic
schema of FSSA, FSSA R-forecasting, and FSSA V-forecasting. Technical
details of the Rfssa
package are provided in Section 4, where we describe the available
classes in the package and illustrate their practical usage with
examples of real data. Section 5 focuses on the reconstruction stage and
FSSA/MFSSA forecasting. In Section 6, we provide a summary of the
embedded shiny app. Finally, we conclude the paper in Section 7.
Define \(\textbf{y}_N=(y_1,\ldots,y_N)\) to be a collection of observations from an FTS. In the theory of FTS, \(y_i\)’s are considered as functions in the space \(\mathbb{H}=L^2(\mathcal{T})\) where \(\mathcal{T}\) is a compact subset of \(\mathbb{R}.\) Let \(s\in\mathcal{T}\) and consider \(y_i(s)\in\mathbb{R}^p\), the sequence of \(\textbf{y}_N\) is called (univariate) FTS if \(p=1\), and multivariate FTS (or MFTS) if \(p>1.\)
In the realm of functional data analysis, we operate under the assumption that the underlying sample functions, denoted as \(y_{i}(\cdot)\), exhibit smoothness for each sample \(i\), where \(i=1, \ldots, N\). Nevertheless, in practical scenarios, observations are typically acquired discretely at a set of grid points and are susceptible to contamination by random noise. This phenomenon can be represented as follows:
$$
Y_{i,k} = y_{i}(t_k) +\varepsilon_{i,k}, \quad k=1,\ldots, K. (\#eq:discr-data)$$
In this expression, \(t_k\in\mathcal{T}\), and \(K\) denotes the count of discrete grid points across all samples. The \(\varepsilon_{i,k}\) terms represent i.i.d. random noise.
To preprocess the raw data, it is customary to employ smoothing techniques, converting the discrete observations \(Y_{i,k}\) into a continuous form, \(y_{i}(\cdot)\). This is typically performed individually for each variable and sample. One widely used approach is finite basis function expansion (Ramsay and Silverman 2005). In this method, a set of basis functions \(\left\lbrace \nu_i \right\rbrace_{ i\in\mathbb{N}}\) is considered (not necessarily orthogonal) for the function space \(\mathbb{H}\). Each sample function \(y_{i}(\cdot)\) in (??) is then considered as a finite linear combination of the first \(d\) basis functions:
$$
y_i(s)= \sum_{j=1}^d c_{ij}\nu_j(s). (\#eq:basis-expan)$$
Subsequently, the coefficients \(c_{ij}\) can be estimated using least
square techniques. By adopting the linear representation form for the
functional data in (??), we establish a correspondence
between each function \(y_i(\cdot)\) and its coefficient vector
\({\pmb c}_i=(c_{ij})_{j=1}^d.\) As a result, the coefficient vectors
\({\pmb c}_i\) can serve to store and retrieve the original functions,
\(y_i(\cdot)\)’s. This arises from the inherent isomorphism between two
finite vector spaces of the same dimension (in this case, \(d\)).
Consequently, \({\pmb c}_i\)’s are stored as the primary attribute of
funts
objects within the
Rfssa package.
Take two elements \(x, y\in \mathbb{H}\) with corresponding coefficient vectors \({\pmb c}_x\) and \({\pmb c}_y.\) Then, the inner product of \(x, y\) can be computed in matrix form as \(\langle x,y \rangle={\pmb c}_x^\top \mathbf{G} {\pmb c}_y\), where \(\mathbf{G}=[ \langle \nu_i,\nu_j \rangle ]_{i,j=1}^{d}\) is the Gram matrix.
It is important to note that \(\mathbf{G}\) is Hermitian. Furthermore, because the basis functions \(\{\nu_i\}_{i=1}^d\) are linearly independent, \(\mathbf{G}\) is positive definite, making it invertible (Horn and Johnson 2012 Thm. 7.2.10).
Moreover, let \(A:\mathbb{H}\rightarrow \mathbb{H}\) be a linear operator and \(y=A(x).\) Then, \({\pmb c}_y= \mathbf{G}^{-1}\mathbf{A}{\pmb c}_x,\) where \(\mathbf{A}=[ \langle A(\nu_j),\nu_i \rangle ]_{i,j=1}^{d}\) is called the corresponding matrix of the operator \(A.\)
It is worth noting that while the FSSA theory extends to arbitrary
dimensions, practical implementation for dimensions greater than \(2\)
introduces considerable computational complexity. Moreover,
high-dimensional FTS data are relatively rare in real-world
applications. Therefore, within the
Rfssa package, we have
chosen to confine the funts
object to support functions observed over
domains that are one or two-dimensional. In the
Rfssa package, the task
of preprocessing the raw discrete observations and converting those to
the funts
object is assigned to the funts(
\(\cdot\))
constructor.
FSSA is a nonparametric technique to decompose FTS and MFTS, and the methodology can also be used to forecast such data (Haghbin et al. 2021; Trinka et al. 2022; Trinka et al. 2023); it can also be used as a visualization tool to illustrate the concept of seasonality and periodicity in the functional space over time.
Basic FSSA consists of two stages where each stage includes two steps. We outline the four steps of the FSSA algorithm here.
The authors would like to express their sincere gratitude to the anonymous reviewers for their valuable feedback and constructive comments, which greatly contributed to the improvement of this work.
Additionally, we would like to acknowledge the significant contributions of Dr. S. Morteza Najibi during the development stages of the first version of the Rfssa package.
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2024-019.zip
fda, funFEM, fda.usc, refund, fdapace, funData, ftsspec, rainbow, ftsa, fdasrvf, roahd, Rfssa, Rcpp, RcppArmadillo, Rssa, ASSA
Cluster, FunctionalData, HighPerformanceComputing, NumericalMathematics, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Haghbin, et al., "Rfssa: An R Package for Functional Singular Spectrum Analysis", The R Journal, 2025
BibTeX citation
@article{RJ-2024-019, author = {Haghbin, Hossein and Trinka, Jordan and Maadooliat, Mehdi}, title = {Rfssa: An R Package for Functional Singular Spectrum Analysis}, journal = {The R Journal}, year = {2025}, note = {https://doi.org/10.32614/RJ-2024-019}, doi = {10.32614/RJ-2024-019}, volume = {16}, issue = {2}, issn = {2073-4859}, pages = {1} }