Rfssa: An R Package for Functional Singular Spectrum Analysis

Functional Singular Spectrum Analysis (FSSA) is a non-parametric approach for analyzing Functional Time Series (FTS) and Multivariate FTS (MFTS) data. This paper introduces Rfssa, an R package that addresses implementing FSSA for FTS and MFTS data types. Rfssa provides a flexible container, the funts class, for FTS/MFTS data observed on one-dimensional or multi-dimensional domains. It accepts arbitrary basis systems and offers powerful graphical tools for visualizing time-varying features and pattern changes. The package incorporates two forecasting algorithms for FTS data. Developed using object-oriented programming and Rcpp/RcppArmadillo, Rfssa ensures computational efficiency. The paper covers theoretical background, technical details, usage examples, and highlights potential applications of Rfssa.

Hossein Haghbin (Artificial Intelligence and Data Mining Research Group, ICT Research Institute) , Jordan Trinka (Department of Mathematical and Statistical Sciences, Marquette University,) , Mehdi Maadooliat (Department of Mathematical and Statistical Sciences, Marquette University,)
2025-03-11

1 Introduction

In recent times, advancements in data acquisition techniques have made it possible to collect data in high-resolution formats. Due to the presence of temporal-spatial dependence, one may consider this type of data as functional data.

Functional Data Analysis (FDA) focuses on developing statistical methodologies for analyzing data represented as functions or curves. While FDA methods are particularly well-suited for handling smooth continuum data, they can also be adapted and extended to effectively analyze functional data that may not exhibit perfect smoothness, including high-resolution data and data with inherent variability.

The widely-used R package for FDA is fda (Ramsay et al. 2023), which is designed to support analysis of functional data, as described in the textbook by (Ramsay and Silverman 2005). Additionally, there are over 40 other R packages available on CRAN that incorporate functional data analysis, such as funFEM (Bouveyron 2021), fda.usc (Febrero-Bandle and Fuente 2012), refund (Goldsmith et al. 2023), fdapace (Gajardo et al. 2022), funData (Happ-Kurz 2020), ftsspec (Tavakoli 2015), rainbow (Shang and Hyndman 2022), and ftsa (Hyndman and Shang 2023).

One crucial initial requirement for any of these packages is to establish a framework for representing and storing infinite-dimensional functional observations. The fda package, for instance, employs the fd class as a container for functional data defined on a one-dimensional (1D) domain. An fd object represents functional data as a finite linear combination of known basis functions (e.g., Fourier, B-splines, etc.), storing both the basis functions and their respective coefficients for each curve. This representation aligns with the practical implementation found in many papers within the field of FDA. Conversely, several other R packages store functional data in a discrete form evaluated on grid points (e.g., fda.usc, refund, funData, rainbow, and fdapace). These packages also provide the capability to analyze functions beyond the one-dimensional case, such as image data treated as two-dimensional (2D) functions (e.g., refund, fdasrvf, and funData). To the best of our knowledge, packages that support representation beyond 1D functions utilize the grid point representation for execution and storage.

Moreover, recent packages have been developed to handle multivariate functional data, which consist of more than one function per observation unit. Examples of such packages include roahd, fda.usc, and funData.

While some recent FDA packages have focused on analyzing and implementing techniques for Functional Time Series (FTS), where sequences of functions are observed over time, none of them handle Multivariate FTS (MFTS) or multidimensional MFTS. For example, see the packages ftsspec, rainbow, and ftsa. In summary, there is still a need for a unified and flexible container for FTS/MFTS data, defined on either one or multidimensional domains. The funts class in Rfssa (Haghbin et al. 2023), the package discussed in this article, aims to address this gap. One of the primary contributions of the package is its capacity to handle and visualize 2-dimensional FTS, including image data. Furthermore, the package accommodates MFTS, especially when observed on distinct domains. This flexibility empowers users to analyze and visualize FTS with multiple variables, even when they do not share the same domain. Notably, the Rfssa package introduces novel visualization tools (as exemplified in Figure @ref(fig:call_center)). These tools include heatmaps and 3D plots, thoughtfully designed to provide a deeper understanding of functional patterns over time. They enhance the ability to discern trends and variations that might remain inconspicuous in conventional plots. An additional feature of the funts class is its ability to accept any arbitrary basis system as input for the class constructor, including FDA basis functions or even empirical basis represented as matrices evaluated at grid points. The classes in the Rfssa package are developed using the S3 object-oriented programming system, and for computational efficiency, significant portions of the package are implemented using the Rcpp/RcppArmadillo packages. Notably, the package includes a shiny web application that provides a user-friendly GUI for implementing Functional Singular Spectrum Analysis (FSSA) on real or simulated FTS/MFTS data.

The Rfssa package was initially developed to implement FSSA for FTS, as discussed in the work of (Haghbin et al. 2021). FSSA extends Singular Spectrum Analysis (SSA), a model-free procedure commonly used to analyze time series data. The primary goal of SSA is to decompose the original series into a collection of interpretable components, such as slowly varying trends, oscillatory patterns, and structureless noise. Notably, SSA does not rely on restrictive assumptions like stationarity, linearity, or normality (Golyandina and Zhigljavsky 2013).

It’s worth noting that SSA finds applications beyond the functional framework, including smoothing and forecasting purposes (Hassani and Mahmoudvand 2013; Carvalho and Rua 2017). The non-functional version of FSSA, known as SSA, has previously been implemented in the Rssa package (Golyandina et al. 2015) and the ASSA package (Carvalho and Martos 2020).

The Rssa package provides various visualization tools to facilitate the grouping stage, and the Rfssa package includes equivalent functional versions of those tools (Golyandina et al. 2018). While the foundational theory of FSSA was originally designed for univariate FTS, it has since been extended to handle multidimensional FTS data, referred to as Multivariate FSSA (MFSSA) (Trinka et al. 2022). Furthermore, in line with the developments in SSA for forecasting, two distinct algorithms known as Recurrent Forecasting (FSSA R-forecasting) and Vector Forecasting (FSSA V-forecasting) were introduced for FSSA by (Trinka et al. 2023). Both of these forecasting algorithms, along with the capabilities for handling MFSSA, have been seamlessly integrated into the most recent version of the Rfssa package.

The remainder of this manuscript is organized as follows. Section 2 introduces the FTS/MFTS data preparation theory used in the funts class. Section 3 discusses the FSSA methodology, including the basic schema of FSSA, FSSA R-forecasting, and FSSA V-forecasting. Technical details of the Rfssa package are provided in Section 4, where we describe the available classes in the package and illustrate their practical usage with examples of real data. Section 5 focuses on the reconstruction stage and FSSA/MFSSA forecasting. In Section 6, we provide a summary of the embedded shiny app. Finally, we conclude the paper in Section 7.

2 Data preparation in FTS

Define \(\textbf{y}_N=(y_1,\ldots,y_N)\) to be a collection of observations from an FTS. In the theory of FTS, \(y_i\)’s are considered as functions in the space \(\mathbb{H}=L^2(\mathcal{T})\) where \(\mathcal{T}\) is a compact subset of \(\mathbb{R}.\) Let \(s\in\mathcal{T}\) and consider \(y_i(s)\in\mathbb{R}^p\), the sequence of \(\textbf{y}_N\) is called (univariate) FTS if \(p=1\), and multivariate FTS (or MFTS) if \(p>1.\)

In the realm of functional data analysis, we operate under the assumption that the underlying sample functions, denoted as \(y_{i}(\cdot)\), exhibit smoothness for each sample \(i\), where \(i=1, \ldots, N\). Nevertheless, in practical scenarios, observations are typically acquired discretely at a set of grid points and are susceptible to contamination by random noise. This phenomenon can be represented as follows:

$$

Y_{i,k} = y_{i}(t_k) +\varepsilon_{i,k}, \quad k=1,\ldots, K.   (\#eq:discr-data)$$

In this expression, \(t_k\in\mathcal{T}\), and \(K\) denotes the count of discrete grid points across all samples. The \(\varepsilon_{i,k}\) terms represent i.i.d. random noise.

To preprocess the raw data, it is customary to employ smoothing techniques, converting the discrete observations \(Y_{i,k}\) into a continuous form, \(y_{i}(\cdot)\). This is typically performed individually for each variable and sample. One widely used approach is finite basis function expansion (Ramsay and Silverman 2005). In this method, a set of basis functions \(\left\lbrace \nu_i \right\rbrace_{ i\in\mathbb{N}}\) is considered (not necessarily orthogonal) for the function space \(\mathbb{H}\). Each sample function \(y_{i}(\cdot)\) in (??) is then considered as a finite linear combination of the first \(d\) basis functions:

$$

y_i(s)= \sum_{j=1}^d c_{ij}\nu_j(s).   (\#eq:basis-expan)$$

Subsequently, the coefficients \(c_{ij}\) can be estimated using least square techniques. By adopting the linear representation form for the functional data in (??), we establish a correspondence between each function \(y_i(\cdot)\) and its coefficient vector \({\pmb c}_i=(c_{ij})_{j=1}^d.\) As a result, the coefficient vectors \({\pmb c}_i\) can serve to store and retrieve the original functions, \(y_i(\cdot)\)’s. This arises from the inherent isomorphism between two finite vector spaces of the same dimension (in this case, \(d\)). Consequently, \({\pmb c}_i\)’s are stored as the primary attribute of funts objects within the Rfssa package.

Take two elements \(x, y\in \mathbb{H}\) with corresponding coefficient vectors \({\pmb c}_x\) and \({\pmb c}_y.\) Then, the inner product of \(x, y\) can be computed in matrix form as \(\langle x,y \rangle={\pmb c}_x^\top \mathbf{G} {\pmb c}_y\), where \(\mathbf{G}=[ \langle \nu_i,\nu_j \rangle ]_{i,j=1}^{d}\) is the Gram matrix.

It is important to note that \(\mathbf{G}\) is Hermitian. Furthermore, because the basis functions \(\{\nu_i\}_{i=1}^d\) are linearly independent, \(\mathbf{G}\) is positive definite, making it invertible (Horn and Johnson 2012 Thm. 7.2.10).

Moreover, let \(A:\mathbb{H}\rightarrow \mathbb{H}\) be a linear operator and \(y=A(x).\) Then, \({\pmb c}_y= \mathbf{G}^{-1}\mathbf{A}{\pmb c}_x,\) where \(\mathbf{A}=[ \langle A(\nu_j),\nu_i \rangle ]_{i,j=1}^{d}\) is called the corresponding matrix of the operator \(A.\)

It is worth noting that while the FSSA theory extends to arbitrary dimensions, practical implementation for dimensions greater than \(2\) introduces considerable computational complexity. Moreover, high-dimensional FTS data are relatively rare in real-world applications. Therefore, within the Rfssa package, we have chosen to confine the funts object to support functions observed over domains that are one or two-dimensional. In the Rfssa package, the task of preprocessing the raw discrete observations and converting those to the funts object is assigned to the funts(\(\cdot\)) constructor.

3 An overview of the FSSA methodology

FSSA is a nonparametric technique to decompose FTS and MFTS, and the methodology can also be used to forecast such data (Haghbin et al. 2021; Trinka et al. 2022; Trinka et al. 2023); it can also be used as a visualization tool to illustrate the concept of seasonality and periodicity in the functional space over time.

Basic schema of FSSA

Basic FSSA consists of two stages where each stage includes two steps. We outline the four steps of the FSSA algorithm here.

Acknowledgments

The authors would like to express their sincere gratitude to the anonymous reviewers for their valuable feedback and constructive comments, which greatly contributed to the improvement of this work.

Additionally, we would like to acknowledge the significant contributions of Dr. S. Morteza Najibi during the development stages of the first version of the Rfssa package.

4 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2024-019.zip

5 CRAN packages used

fda, funFEM, fda.usc, refund, fdapace, funData, ftsspec, rainbow, ftsa, fdasrvf, roahd, Rfssa, Rcpp, RcppArmadillo, Rssa, ASSA

6 CRAN Task Views implied by cited packages

Cluster, FunctionalData, HighPerformanceComputing, NumericalMathematics, TimeSeries

7 Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

C. Bouveyron. funFEM: Clustering in the discriminative functional subspace. 2021. URL https://CRAN.R-project.org/package=funFEM. R package version 1.2.
M. de Carvalho and G. Martos. ASSA: Applied singular spectrum analysis. 2020. URL https://CRAN.R-project.org/package=ASSA. R package version 2.0.
M. de Carvalho and A. Rua. Real-time nowcasting the US output gap: Singular spectrum analysis at work. International Journal of Forecasting, 33(1): 185–198, 2017.
M. Febrero-Bandle and M. O. de la Fuente. Statistical computing in functional data analysis: The r package. Journal of Statistical Software, 51(4): 1–28, 2012. DOI 10.18637/jss.v051.i04.
A. Gajardo, S. Bhattacharjee, C. Carroll, Y. Chen, X. Dai, J. Fan, P. Z. Hadjipantelis, K. Han, H. Ji, C. Zhu, et al. Fdapace: Functional data analysis and empirical dynamics. 2022. URL https://CRAN.R-project.org/package=fdapace. R package version 0.5.9.
J. Goldsmith, F. Scheipl, L. Huang, J. Wrobel, C. Di, J. Gellar, J. Harezlak, M. W. McLean, B. Swihart, L. Xiao, et al. Refund: Regression with functional data. 2023. URL https://CRAN.R-project.org/package=refund. R package version 0.1.32.
N. Golyandina, A. Korobeynikov, A. Shlemov and K. Usevich. Multivariate and 2D extensions of the Rssa package. Journal of Statistical Software, 67(2): 1–78, 2015. URL http://dx.doi.org/10.18637/jss.v067.i02.
N. Golyandina, A. Korobeynikov and A. Zhigljavsky. Singular spectrum analysis with r. Springer Berlin Heidelberg, 2018.
N. Golyandina and A. Zhigljavsky. Singular Spectrum Analysis for Time Series. Springer Science & Business Media, Berlin, Heidelberg, 2013.
H. Haghbin, S. Morteza Najibi, R. Mahmoudvand, J. Trinka and M. Maadooliat. Functional singular spectrum analysis. Stat, e330, 2021. DOI https://doi.org/10.1002/sta4.330. e330 STAT-20-0240.R1.
H. Haghbin, J. Trinka, S. M. Najibi and M. Maadooliat. Rfssa: Functional singular spectrum analysis. 2023. URL https://CRAN.R-project.org/package=Rfssa. R package version 3.0.2.
C. Happ-Kurz. Object-oriented software for functional data. Journal of Statistical Software, 93(5): 1–38, 2020. DOI 10.18637/jss.v093.i05.
H. Hassani and R. Mahmoudvand. Multivariate singular spectrum analysis: A general view and new vector forecasting approach. International Journal of Energy and Statistics, 01(01): 55–83, 2013. URL https://doi.org/10.1142/S2335680413500051.
R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge University Press, 2012.
R. Hyndman and H. L. Shang. ftsa: Functional Time Series Analysis. 2023. URL https://CRAN.R-project.org/package=ftsa. R package version 6.3.0.
J. O. Ramsay and B. W. Silverman. Functional Data Analysis. New York, NY, 2005. URL http://0-search.ebscohost.com.libus.csd.mu.edu/login.aspx?direct=true&db=cat06952a&AN=mul.b2395232&site=eds-live.
J. O. Ramsay, H. Wickham, S. Graves and G. Hooker. fda: Functional Data Analysis. 2023. URL https://CRAN.R-project.org/package=fda. R package version 6.1.4.
H. L. Shang and R. Hyndman. Rainbow: Rainbow plots, bagplots and boxplots for functional data. 2022. URL https://CRAN.R-project.org/package=rainbow. R package version 3.7.
S. Tavakoli. ftsspec: Spectral Density Estimation and Comparison for Functional Time Series. 2015. URL https://CRAN.R-project.org/package=ftsspec. R package version 1.0.0.
J. Trinka, H. Haghbin and M. Maadooliat. Multivariate functional singular spectrum analysis: A nonparametric approach for analyzing multivariate functional time series. In Innovations in multivariate statistical modeling: Navigating theoretical and multidisciplinary domains, pages. 187–221 2022. Springer.
J. Trinka, H. Haghbin, H. L. Shang and M. Maadooliat. Functional time series forecasting: Functional singular spectrum analysis approaches. Stat, 12(1): e621, 2023. URL https://doi.org/10.1002/sta4.621.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Haghbin, et al., "Rfssa: An R Package for Functional Singular Spectrum Analysis", The R Journal, 2025

BibTeX citation

@article{RJ-2024-019,
  author = {Haghbin, Hossein and Trinka, Jordan and Maadooliat, Mehdi},
  title = {Rfssa: An R Package for Functional Singular Spectrum Analysis},
  journal = {The R Journal},
  year = {2025},
  note = {https://doi.org/10.32614/RJ-2024-019},
  doi = {10.32614/RJ-2024-019},
  volume = {16},
  issue = {2},
  issn = {2073-4859},
  pages = {1}
}