FMM: An R Package for Modeling Rhythmic Patterns in Oscillatory Systems

This paper is dedicated to the R package FMM which implements a novel approach to describe rhythmic patterns in oscillatory signals. The frequency modulated (FMM) model is defined as a parametric signal plus a Gaussian noise, where the signal can be described as a single or a sum of waves. The FMM approach is flexible enough to describe a great variety of rhythmic patterns. The FMM package includes all required functions to fit and explore single and multi-wave FMM models, as well as a restricted version that allows equality constraints between parameters representing a priori knowledge about the shape to be included. Moreover, the FMM package can generate synthetic data and visualize the results of the fitting process. The potential of this methodology is illustrated with examples of such biological oscillations as the circadian rhythm in gene expression, the electrical activity of the heartbeat and the neuronal activity.

Itziar Fernández (Universidad de Valladolid) , Alejandro Rodríguez-Collado (Universidad de Valladolid) , Yolanda Larriba (Universidad de Valladolid) , Adrián Lamela (Universidad de Valladolid) , Christian Canedo (Universidad de Valladolid) , Cristina Rueda (Universidad de Valladolid)
2022-06-21

1 Introduction

Oscillations naturally occur in a multitude of physical, chemical, biological, and even economic and social processes. Periodic signals appear, for example, during the cell-cycle, in biological time-keeping processes, in human heartbeats, in neuronal signals, in light emissions from certain types of stars, or in business cycles in economics, among many others. Three features typically describe the periodic nature of the oscillatory motion: period, amplitude and phase. The period is the time required for one complete oscillation. Within a period, a sum of monocomponent models, characterized by the phase and amplitude parameters, can be used to describe the rhythmic pattern of a signal (Boashash 2016). By varying the number of monocomponents and considering phase and amplitude parameters as fixed or variable, a large number of rhythmic signal representations can be found.

One of the most popular representations of oscillating signals is the Fourier decomposition (FD): a multicomponent representation with a fixed amplitude parameter. Its monocomponent version, the cosinor model (COS) (Cornelissen 2014), is widely used, in particular in chronobiology, with acceptable results when a sinusoidal shape response within a period is expected. Due to its widespread use, many software utilities are available. Particularly in R, the estimation of a COS model can be performed using cosinor (Sachs 2014) and cosinor2 packages (Mutak 2018). In addition, other packages from widely differing areas of knowledge have specific functions for fitting COS models. Such is the case of, for example, the function CATCosinor in the CATkit package (Gierke et al. 2018), which implements tools for periodicity analysis; the function cosinor in the psych package (Revelle 2021), dedicated to personality and psychological research; or the function cosinor contained in a recent package, card (Shah 2020), which is dedicated to the assessment of the regulation of cardiovascular physiology. Recently, it has also been implemented in other languages such as CosinorPy, a cosinor python package (Moskon 2020). The COS model is easy to use and interpret with symmetrical patterns. However, asymmetric shapes are not captured properly by COS. When the waveform is nonsinusoidal, the use of multiple components analysis to fit a model consisting of a sum of several periodical functions is recommended. However, the multicomponent FD models, developed to provide flexibility from COS, often require the use of a large number of components resulting in serious overfitting issues.

In recent years, alternative methods, mostly nonparametric statistical methods, have been developed and used for analyzing rhythmicity, especially in biological data sets. Some very popular ones, such as the JTK_CYCLE (Hughes et al. 2010), wrongly assume that any underlying rhythms have symmetric waveforms. Others, such as RAIN (Thaben and Westermark 2014), designed to detect more diverse wave shapes including asymmetric patterns, are not focused on modeling but on detecting rhythmic behavior in sets of data. Thus, they are not useful to describe the underlying oscillatory phenomena. The proliferation of methodology in this field has been accompanied by software developments. This is the case, for example, of the DiscoRhythm R package (Carlucci et al. 2020), very recently available on Bioconductor with a web interface based on the R Shiny platform (Chang et al. 2021). This tool allows four popular approaches to be used, including the COS model and JTK_CYCLE, to discover biological rhythmicity. Another recent example is the circacompare (Parsons et al. 2020), an R package implemented for modeling cosinusoidal curves by nonlinear regression. Hosted on GitHub, we can also find the LimoRhyde R package (https://github.com/hugheylab/limorhyde) for the differential analysis of rhythmic transcriptome data, based on fitting linear models (Singer and Hughey 2019).

Motivated by the need for a flexible, interpretable and parametric methodology to fit rhythmic patterns, our research group recently proposed the frequency modulated (FMM) model (Rueda et al. 2019). The FMM is an additive nonlinear parametric regression model capable of adapting to nonsinusoidal shapes and whose parameters are easily interpretable. The single component model has been shown to successfully fit data as diverse as circadian clock signals, hormonal levels data or light data from distant stars. In addition, for more complex oscillatory signals, a multicomponent model of order \(m\), denoted as FMM\(_m\), which includes \(m\) single FMM components, can be used. This is, for example, the case for describing electrocardiography (ECG) signals. The FMM\(_{ecg}\) signal, presented in Rueda et al. (2021b), is defined as the combination of five single FMM components. Another interesting area where the FMM approach has already shown its usefulness is in electrophysiological neuroscience. Specifically, we have proposed FMM methodology for modeling neuronal action potential (AP) curves, oscillating signals that measure the difference between the electrical potential inside and outside the cell (Rodrı́guez-Collado and Rueda 2021a; see Rueda et al. 2021c). An FMM\(_2\) model provides an accurate fitting for a single AP curve; whereas series of AP curves with similar repetitive spikes can be efficiently fitted by the FMM\(_{ST}\) model, a restricted version of the multicomponent FMM model.

In this work we introduce the FMM package (Fernández et al. 2021), programmed in R and available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=FMM. The package implements all required functions to fit and explore single and multicomponent FMM models, as well as a restricted multicomponent version. In addition, the FMM package provides functions to generate synthetic data and visualize the results of the fitted model. Furthermore, its use is illustrated in the aforementioned applications. The remainder of this paper is organized as follows: the next section provides a brief overview of both mono and multicomponent FMM models, as well as the FMM\(_m\) model with equality constraints. The section follows is dedicated to the implementation details of the FMM package. After that, through a simulated example, the basic usage of the package is introduced, including the data generation and the fitting, as well as the visualization of the results. Then, the FMM package performance is shown through three application areas governed by oscillatory systems: chronobiology, ECG and neuroscience. Finally, a summary is provided.

2 Frequency modulated (FMM) model

FMM is a new approach to describe a great variety of rhythmic patterns in oscillatory signals as the composition of several additive components. In this section an overview of the FMM approach is provided. All the methodological details that justify the mathematical formulation of the FMM models are given in (Rueda et al. 2019).

At the time point \(t\), a single FMM wave is defined as \(W\left(t; \upsilon\right) = A \cos\left(\phi\left(t; \alpha, \beta, \omega\right)\right)\) where \(\upsilon = \left(A, \alpha, \beta, \omega\right)^{\prime}\), \(A \in \Re^{+}\) represents the wave amplitude and, \[\label{eq:phase} \phi\left(t; \alpha, \beta, \omega\right)= \beta + 2\arctan\left(\omega \tan\left(\frac{t - \alpha}{2}\right)\right) \tag{1}\] the wave phase. The phase angle \(\phi\) of an FMM wave is defined using the link (see Downs and Mardia 2002; Kato et al. 2008) rather than the linear link function as in the COS model. The link provides much more flexibility to describe nonsinusoidal patterns. Without loss of generality, we assume that the time point \(t \in \left[0, 2\pi\right]\). Otherwise it can be transformed into \(t^{\prime} \in \left[t_0, T + t_0\right]\) by \(t = \frac{\left(t^{\prime} - t_0\right)2\pi}{T}\).

Each of the four parameters of an FMM wave characterizes some aspect of a rhythmic pattern. \(A\) describes the amplitude of the signal, while \(\alpha\), \(\beta\) and \(\omega\) describe the wave phase. \(\alpha \in \left[0, 2\pi\right]\) is a translation parameter and a wave location parameter in the real space, whereas \(\beta \in \left[0, 2\pi\right]\) and \(\omega \in \left[0, 1\right]\) describe the wave shape. To be precise, assuming \(\alpha = 0\), the unimodal symmetric waves are characterized by values of \(\beta\) close to \(0\), \(\pi\) or \(2\pi\). When \(\beta = \frac{\pi}{2}\) or \(\beta = \frac{3\pi}{2}\), extreme asymmetric patterns are described. Moreover, a value of \(\omega\) close to zero describes an extreme spiked wave and, as \(\omega\) value increases, the pattern is increasingly smoother. When \(\omega = 1\), a sinusoidal wave is described and the FMM model matches the COS model where \(\varphi = \beta - \alpha\) is the acrophase parameter.

Two important features of a wave are the peak and trough, defined as the highest and lowest points above and below the rest position, respectively. In many applications, the peak and trough times could be very useful tools to extract practical information of a wave, since they capture important aspects of the dynamics. These two interesting parameters can be directly derived from the main parameters of an FMM wave as, \[\begin{aligned} \label{eq:monoFMM:fiducial} t^U & = \alpha + 2\arctan\left(\frac{1}{\omega}\tan\left(-\frac{\beta}{2}\right)\right) \\ t^L & = \alpha + 2\arctan\left(\frac{1}{\omega}\tan\left(\frac{\pi-\beta}{2}\right)\right) \nonumber \end{aligned} \tag{2}\] where \(t^U\) and \(t^L\) denote the peak and trough times, respectively.

Monocomponent FMM model

Let \(X\left(t_i\right)\), \(t_1 < t_2 < \dots < t_n\) be the vector of observations. The monocomponent FMM model is defined as follows: \[\label{eq:monoFMM} X\left(t_i\right) = M + W\left(t_i; \upsilon\right) + e\left(t_i\right), \quad i = 1,\dots,n \tag{3}\] where \(M \in \Re\) is an intercept parameter describing the baseline level of the signal, \(W\left(t_i; \upsilon\right)\) is an FMM wave, and it is assumed that the errors \(e\left(t_i\right)\) are independent and normally distributed with zero mean and a common variance \(\sigma^2\).

Estimation algorithm

A two-step algorithm to estimate monocomponent FMM model parameters is proposed. We now describe the substantial details of each stage of the algorithm.

Step 1: Initial parameter estimation. A two-way grid search over the choice of \(\left(\alpha , \omega\right)\) parameters is performed. For each pair of \(\left(\alpha , \omega\right)\) fixed values, the estimates for \(M\), \(A\) and \(\beta\) are obtained by solving a least square problem as detailed below.

The model for a single FMM component can be written as: \[\label{eq:monoFMM:muiParam} X\left(t_i\right) = M + A\cos\left(t_{i}^{*} + \varphi\right)+ e\left(t_i\right) \tag{4}\] where \(t_{i}^{*} = \alpha + 2\arctan\left(\omega \tan\left(\frac{t_i - \alpha}{2}\right)\right)\), \(\varphi = \beta - \alpha\), and \(e\left(t_i\right) \sim N\left(0, \sigma^2\right)\) for \(i = 1, \dots, n\).

Using trigonometric angle sum identity, the model can be rewritten as: \[\label{eq:monoFMM:reWrite} X\left(t_i\right) = M + \delta z_i + \gamma w_i + e\left(t_i\right) \tag{5}\] where \(\delta = A \cos\left(\varphi \right)\), \(\gamma = -A \sin\left(\varphi \right)\), \(z_i = \cos\left(t_i^*\right)\) and \(w_i = \sin\left(t_i^*\right)\).

Since \(\alpha\) and \(\omega\) are fixed, the estimates for \(M\), \(\delta\) and \(\gamma\) are obtained by minimizing the residual sum of the squares (RSS), \[\label{eq:monoFMM:SSR} RSS = \sum_{i=1}^{n} \left(X\left(t_i\right)-\left(\hat{M} + \hat{\delta} z_i + \hat{\gamma} w_i\right)\right)^2 \tag{6}\]

And the estimates for \(M\), \(A\) and \(\beta\) are straightforward to derive as follows, \[\begin{aligned} \label{eq:monoFMM:estimates} \hat{M} & = \bar{X} - \hat{\delta} \sum_{i=1}^{n} z_i - \hat{\gamma} \sum_{i=1}^{n} w_i\\ \hat{A} & = \sqrt{\hat{\delta}^2 + \hat{\gamma}^2} \\ \hat{\beta} & = \alpha + \varphi \end{aligned} \tag{7}\]

The best combination of \(\left(\alpha, \omega \right)\) values, with the lowest RSS, is retained and the corresponding estimates are the initial parameter estimation values.

Step 2: Optimization. In the second step, the Nelder-Mead optimization method (Nelder and Mead 1965) is used to obtain the final FMM parameter estimates that minimize the RSS.

Multicomponent FMM model

A multicomponent FMM model of order \(m\), denoted by FMM\(_m\), is defined as \[\begin{aligned} \label{eq:multiFMM} X\left(t_i \right) = M + \sum_{J=1}^{m}W\left(t_i; \upsilon_J \right) + e\left(t_i\right) \\ t_1 < t_2 < \dots < t_n; i = 1, \dots, n \nonumber \end{aligned} \tag{8}\] where \(W\left(t_i; \upsilon_J\right)\), hereinafter denoted by \(W_J\left(t_i\right)\), is the Jth FMM wave and,

Model adequacy

The goodness of fit of an FMM model is measured with the \(R^2\) statistic that represents the proportion of the variance explained by the model out of the total variance, that is: \[\label{eq:multiFMM:R2} R^2 = 1 - \frac{\sum_{i=1}^{n} \left(X\left(t_i\right) - \hat{X}\left(t_i\right)\right)^2}{\sum_{i=1}^{n} \left(X\left(t_i\right) - \bar{X}\right)^2} \tag{9}\] where \(\hat{X}\left(t_i\right)\) represents the fitted value at \(t_i, i=1, \dots ,n\).

Estimation algorithm

An iterative backfitting algorithm is proposed to derive estimates for the FMM parameters. Let \(\hat{W}_J^{\left(k\right)}\left(t_i\right)\) denote the fitted values from the Jth FMM wave at \(t_i, i=1,...,n\) in the kth iteration. The algorithm is structured as follows:

  1. Initialize. Set \(\hat{W}_1^{\left(0\right)}\left(t_i\right) = \dots = \hat{W}_m^{\left(0\right)}\left(t_i\right) = 0\).

  2. Backfitting step. For \(J = 1, \dots ,m\), calculate \[\label{eq:multiFMM:partialres} r_J^{\left(k\right)}\left(t_i\right) = X\left(t_i\right) - \sum_{I < J} \hat{W}_I^{\left(k\right)}\left(t_i\right) - \sum_{I > J} \hat{W}_I^{\left(k-1\right)}\left(t_i\right); I = 1, \dots, m \tag{10}\] and fit a monocomponent FMM model to \(r_J^{\left(k\right)}\left(t_i\right)\) obtaining \(\hat{\alpha}_J^{\left(k\right)}\), \(\hat{\beta}_J^{\left(k\right)}\), \(\hat{\omega}_J^{\left(k\right)}\) and \(\hat{W}_J^{\left(k\right)}\left(t_i\right)\).

  3. Repeat the backfitting step until the stopping criterion is reached. The stopping criterion is defined as the difference between the explained variability in two consecutive iterations: \(R_k^2 - R_{k-1}^2 \leq C\), where \(R_k^2\) (defined in Equation (9)) is the proportion of variance explained by the model in the kth iteration and \(C\) a constant.

  4. \(\hat{M}\) and \(\hat{A_J}\) are derived by solving \[\label{eq:multiFMM:backfitting} \min_{M \in \Re; A_J \in \Re^+} \sum_{i=1}^n \left(X\left(t_i\right) - M - \sum_{J=1}^{m} A_{J}\cos\left(\hat{\phi}_J\left(t_i\right)\right)\right)^2 \tag{11}\] where \(\hat{\phi}_J\left(t_i\right) = \phi\left(t_i; \hat{\alpha}_J, \hat{\beta}_J, \hat{\omega}_J\right)\) defined in Equation (1).

Restricted multicomponent FMM model

Modeling signals with repetitive shape-similar waves can be very useful in some applications (see Rodrı́guez-Collado and Rueda 2021a). In order to obtain more efficient estimators, equality constraints are imposed on the \(\beta\) and \(\omega\) parameters of an FMM\(_m\) model. In particular, we add \(d\) blocks of restrictions: \[\begin{aligned} \label{eq:multiFMM:betaRestr} \beta_1 & = \dots = \beta_{m_1} & \omega_1 = \dots = \omega_{m_1}\\ \beta_{m_1+1} & = \dots = \beta_{m_2} & \omega_{m_1+1} = \dots = \omega_{m_2} \nonumber \\ &\dots \nonumber \\ \beta_{m_{d-1}+1} & = \dots = \beta_{m_d} & \omega_{m_{d-1}+1} = \dots = \omega_{m_d}\nonumber \end{aligned} \tag{12}\]

The parameter estimation problem is solved by an adaptation of the standard procedure.

FMM\(_m\) estimation algorithm with restrictions on the \(\beta\) parameters

Given the unrestricted estimates obtained in step \(3\), the estimates for \(\beta_1, \beta_{m_1+1}, \dots, \beta_{m_{d-1}+1}\) under equality restrictions (Equation (12)) are computed as follows:

\[\begin{aligned} &\hat{\beta}_J^* = \mathrm{angularMean}\left(\hat{\beta}_1, \dots ,\hat{\beta}_{m_1}\right)& J & = 1, \dots ,m_1 \\ &\hat{\beta}_J^* = \mathrm{angularMean}\left(\hat{\beta}_{m_1+1}, \dots ,\hat{\beta}_{m_2}\right)& J & = m_1 + 1, \dots ,m_2 \\ &\dots \\ &\hat{\beta}_J^* = \mathrm{angularMean}\left(\hat{\beta}_{m_{d-1}+1}, \dots ,\hat{\beta}_{m_d}\right)& J & = m_{d-1} + 1, \dots ,m_d \end{aligned}\]

Then, the algorithm continues to the next step.

FMM\(_m\) estimation algorithm with restrictions on the \(\omega\) parameters

When constraints for the \(\omega\) parameters are incorporated, the grid search for the different \(\omega\) values is outside the backfitting loops. When the number of blocks is large, the estimation procedure can be computationally unaffordable. In order to reduce the execution time, a two-nested backfitting algorithm is proposed. In the outer backfitting loop, a block is fitted. In the inner loop, the FMM waves belonging to the same block are estimated. This procedure generates a close to optimal solution and is a less computationally expensive alternative.

3 FMM package: Implementation details

The FMM code makes use of the doParallel package (Corporation and Weston 2020) to embed parallelization for the fitting process. Several utilities from the ggplot2 (Wickham 2016) and RColorBrewer (Neuwirth 2014) packages are occasionally necessary for the visualization of the fitted models.

The implementation of FMM is divided into four main functionalities described in the next four sections: the fitting of the FMM models, the new S4 object of class "FMM", the graphical visualization of the fittings and the simulation of synthetic data.

Some general details about the functions contained in the FMM package are shown in Table 1.

Table 1: Summary of the fitting, utility functions and standard methods implemented in FMM package.
Function Description
Fitting function
fitFMM(vData, timePoints, nback, ...) Estimates an FMM\(_{nback}\) model to vData observed at timePoints.
Utility functions
plotFMM(objFMM, ...) Graphically displays an object of class "FMM".
generateFMM(M, A, alpha, beta, omega, ...) Simulates values from an FMM model with parameters \((M =\) M, \(A =\) A, \(\alpha =\) alpha\(, \beta =\) beta\(,\omega =\) omega\()\).
getFMMPeaks(objFMM, ...) Estimates peak and trough times, together with signal values at those times, for each FMM wave.
extractWaves(objFMM) Extracts individual contribution to the fitted values of each FMM wave.
Standard methods for objects of class "FMM"
summary(), show(), coef(), fitted()

Fitting an FMM model

An FMM model can be fitted using the main function fitFMM(). The description and default values of its inputs arguments are shown in Table 2.

The fitting function fitFMM() requires the vData input argument, which contains the data to be fitted. Two other arguments can be used to control a basic fitting: timePoints, which contains the specific time points of the single period; and nback, with the number of FMM components to be fitted. For some applications, such as the study of circadian rhythms, data are collected over multiple periods. This information is received by the fitFMM() function through the input argument nPeriods. When nPeriods>1, the FMM fitting is carried out by averaging the data collected at each time point across all considered periods.

Table 2: Description of the input arguments of the fitFMM() function and their default values.
Argument Default value Description
vData no default value A "numeric" vector containing the data to be fitted by an FMM model.
nPeriods \(1\) A "numeric" value specifying the number of periods at which vData is observed.
timePoints NULL A "numeric" vector containing the time points per period at which data is observed. When timePoints = NULL an equally spaced sequence from \(0\) to \(2\pi\) will be assigned.
nback \(1\) A "numeric" value specifying the number of FMM components to be fitted.
betaOmegaRestrictions \(1:\) nback An "integer" vector of length nback indicating which FMM waves are constrained to have equal \(\beta\) and \(\omega\) parameters.
maxiter nback A "numeric" value specifying the maximum number of iterations for the backfitting algorithm.
stopFunction alwaysFalse Function to check the stopping criterion for the backfitting algorithm.
lengthAlphaGrid \(48\) A "numeric" value specifying the grid resolution of the parameter \(\alpha\).
lengthOmegaGrid \(24\) A "numeric" value specifying the grid resolution of the parameter \(\omega\).
numReps \(3\) A "numeric" value specifying the number of times \(\left(\alpha,\omega\right)\) parameters are refined.
showProgress TRUE TRUE to display a progress indicator on the console.
showTime FALSE TRUE to display execution time on the console.
parallelize FALSE TRUE to use parallelized procedure to fit a FMM model.
restrExactSolution FALSE TRUE to obtain the optimal solution for the restricted fitting.

There are three key issues in the fitting process: the grid search of the pair \(\left(\alpha, \omega\right)\) to solve the estimation problem of a single FMM wave, the backfitting algorithm used for the estimation of the multicomponent models, and the incorporation of restrictions on \(\beta\) and \(\omega\) parameters. Each of these issues is controlled by several arguments described below.

Object of class "FMM"

The fitFMM() function outputs an S4 object of class "FMM" which contains the slots presented in Table 3.

Table 3: Summary of the slots of the S4 object of class "FMM" resulting from fitting an FMM model with \(m\) components.
Slot Description
timePoints A "numeric" vector containing the time points for each data point if one single period is observed.
data A "numeric" vector containing the data to be fitted to an FMM model. Data could be collected over multiple periods.
summarizedData A "numeric" vector containing the summarized data at each time point across all considered periods.
nPeriods A "numeric" value containing the number of periods in data.
fittedValues A "numeric" vector of the fitted values by the FMM model.
M A "numeric" value of the estimated intercept parameter \(M\).
A An \(m\)-element "numeric" vector of the estimated FMM wave amplitude parameter(s) \(A\).
alpha An \(m\)-element "numeric" vector of the estimated FMM wave phase translation parameter(s) \(\alpha\).
beta An \(m\)-element "numeric" vector of the estimated FMM wave skewness parameter(s) \(\beta\).
omega An \(m\)-element "numeric" vector of the estimated FMM wave kurtosis parameter(s) \(\omega\).
SSE A "numeric" value of the residual sum of squares values.
R2 An \(m\)-element "numeric" vector specifying the explained variance by each of the fitted FMM components.
nIter A "numeric" value containing the number of iterations of the backfitting algorithm.

The standard methods implemented for the class "FMM" include the functions summary(), show(), coef() and fitted(). These methods display relevant information of the FMM fitting, and provide the estimated parameters and fitted values. In addition, two more specific functions have been implemented. Through the extractWaves() function, the individual contribution of each FMM wave to the fitted values can be extracted. Finally, the location of the peak and trough of each FMM wave, as well as the value of the signal at these time points, can be estimated using the getFMMPeaks() function. The required argument of all these methods and functions is an object of the class "FMM". Particularly, getFMMPeaks() has an optional argument: timePointsIn2pi, that forces the peak and trough locations to be returned into the interval from \(0\) to \(2\pi\) when it is TRUE.

Plotting FMM models

The FMM package includes the function plotFMM() to visualize the results of an FMM fit. The arguments of this function are summarized in Table  4.

Table 4: Description of the input arguments of the plotFMM() function and their default values.
Argument Default value Description
objFMM no default value The object of class "FMM" to be plotted.
components FALSE TRUE to display a plot of components.
plotAlongPeriods FALSE TRUE to plot more than \(1\) period.
use_ggplot2 FALSE TRUE to plot with ggplot2 package.
legendInComponentsPlot TRUE TRUE to indicate if a legend should be plotted in the component plot.
textExtra empty string Extra text to be added to the title of the plot.

An object of class "FMM" can be plotted in two ways (see Figure 1). The default graphical representation will be a plot on which original data (as points) and the fitted signal (as a line) are plotted together (left panel in Figure 1). The other possible representation is a component plot for displaying each centered FMM wave separately (right panel in Figure 1). Set the boolean argument components = TRUE to show a component plot. When legendInComponentsPlot = TRUE, a legend appears at the bottom of the component plot to indicate the represented waves. The argument textExtra allows an extra text to be added to the title of both graphical representations.

As mentioned above, in some cases, data are collected from different periods. All periods can be displayed simultaneously on the default plot using plotAlongPeriods = TRUE. For the component plot, this argument is ignored.

The argument use_ggplot2 provides a choice between building the plot using base R graphics or ggplot2 packages. By default, the graphics package is used. When use_ggplot2 = TRUE, a more aesthetic and customizable plot is created using the ggplot2 package.

Simulating data from an FMM model

Data from an FMM model can be easily simulated using the function generateFMM() of the package FMM. All input arguments of this function are shown in Table 5, along with a short description and their default values.

Table 5: Description of the input arguments of the generateFMM() function and their default values.
Argument Default value Description
M no default value Value of the intercept parameter \(M\).
A no default value Vector of the FMM wave amplitude parameter \(A\).
alpha no default value Vector of the FMM wave phase translation parameter \(\alpha\).
beta no default value Vector of the FMM wave skewness parameter \(\beta\).
omega no default value Vector of the FMM wave kurtosis parameter \(\omega\).
from \(0\) Initial time point of the simulated data.
to \(2\pi\) Final time point of the simulated data.
length.out \(100\) Desired length of the simulation.
timePoints seq(from, to, length = length.out) Time points at which the data will be simulated.
plot TRUE TRUE when the simulated data should be drawn on a plot.
outvalues TRUE TRUE when the numerical simulation should be returned.
sigmaNoise \(0\) Standard deviation of the Gaussian noise to be added.

The main arguments of this function are M, A, alpha, beta and omega, whereby the values of the FMM model parameters are passed to the function. All these arguments are "numeric" vectors of length \(m\), except M, which has length \(1\). Longer and smaller vectors will be truncated or replicated as appropriate.

By default, the data will be simulated at a sequence of \(100\) equally spaced time points from \(0\) to \(2\pi\). The arguments from, to and length.out control such sequences. The sequence can also be manually set using the argument timePoints, in which case from, to and length.out will be ignored.

The user can add a Gaussian noise by argument sigmaNoise. A positive "numeric" value sets the corresponding standard deviation of the Gaussian noise to be added. To create the normally distributed noise, the rnorm() function is used.

The arguments plot and outvalues, both boolean values, determine the output of the generateFMM() function. When outvalues = TRUE, a "list" with input parameters, time points and simulated data is returned. These elements are named input, t and y, respectively. In addition, a scatter plot of y against t can be drawn by setting plot = TRUE.

4 Basic usage of the FMM package

The example below, based on FMM synthetic data, illustrates the basic uses and capabilities of the functions implemented in the FMM package. A set of \(100\) observations is simulated from an FMM\(_4\) model with intercept parameter \(M = 3\), amplitude parameters: \(A_1 = 4\), \(A_2 = 3\), \(A_3 = 1.5\) and \(A_4 = 1\), and phase translation parameters: \(\alpha_1 = 3.8\), \(\alpha_2 = 1.2\), \(\alpha_3 = 4.5\) and \(\alpha_4 = 2\). With regard to the shape parameters, pairs of waves are equal. Specifically, the shape parameters satisfy: \[\begin{aligned} & \beta_1 = \beta_2 = 3 & \omega_1 & =\omega_2 = 0.1\\ & \beta_3 = \beta_4 = 1 & \omega_3 & =\omega_4 = 0.05 \end{aligned}\] The standard deviation of the error term is set at \(\sigma = 0.3\). We use the function generateFMM() to simulate this data set. A set.seed() statement is used to guarantee the reproducibility of the results.

> library("FMM")
> set.seed(1115)
> rfmm.data <-generateFMM(M = 3, A = c(4,3,1.5,1), alpha = c(3.8,1.2,4.5,2),
+                         beta = c(rep(3,2),rep(1,2)), 
+                         omega = c(rep(0.1,2),rep(0.05,2)),
+                         plot = FALSE, outvalues = TRUE,
+                         sigmaNoise = 0.3)

The estimation of an FMM\(_4\) can be performed by setting nback = 4 in the fitting function fitFMM(). The betaOmegaRestrictions parameter allows a wide variety of shape restrictions to be incorporated into the fitting procedure. In this example, to impose the shape restrictiction on the fitting process, we use betaOmegaRestrictions = c(1, 1, 2, 2).

> fit.rfmm <- fitFMM(vData = rfmm.data$y, timePoints = rfmm.data$t, nback = 4,
+                    betaOmegaRestrictions = c(1, 1, 2, 2))
|--------------------------------------------------|
|==================================================|
 Stopped by reaching maximum iterations (4 iteration(s)) 

The results are displayed by the function summary():

> summary(fit.rfmm)

Title:
FMM model with 4 components

Coefficients:
M (Intercept): 3.1661
                  A  alpha   beta  omega
FMM wave 1:  4.0447 3.8048 3.0238 0.0930
FMM wave 2:  3.1006 1.1956 3.0238 0.0930
FMM wave 3:  1.6069 4.5228 1.0145 0.0427
FMM wave 4:  1.1194 1.9788 1.0145 0.0427

Peak and trough times and signals:
             t.Upper Z.Upper t.Lower Z.Lower
FMM wave 1:   0.6741  5.3198  4.9354 -2.7565
FMM wave 2:   4.3482  3.4702  2.3263 -2.1742
FMM wave 3:   1.5345 -1.2330  1.3338 -4.1527
FMM wave 4:   5.2737 -1.7005  5.0730 -3.7565

Residuals:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.719769 -0.162649  0.007025  0.000000  0.160127  0.904218 

R-squared:
Wave 1 Wave 2 Wave 3 Wave 4  Total 
0.5049 0.3906 0.0531 0.0276 0.9761 

The FMM wave parameter estimates, as well as the peak and trough times, together with the signal values at those times, are presented in tabular form, where each row corresponds to a component and each column to an FMM wave parameter. As part of the summary, a brief description of the residuals, the proportion of variance explained by each FMM component and by the global model are also shown. The summary() output can be assigned to an object to get a "list" of all the displayed results.

Other options to return the results are the functions coef(), getFMMPeaks() and resid(). The first two return a "list" similar to those obtained with summary(). The resid() method can be used to obtain the complete residuals vector. In addition, the fitted values can be extracted by the function fitted(), which returns a "data.frame" with two columns: time points and fitted values.

The FMM plots can be generated in the R graphics or ggplot2 packages. In the code example given below, we use use_ggplot2 = TRUE to build Figure 1 based on ggplot2. The use of ggplot2 makes it easier to customize our plots and modify features, such as scales, margins, axes, etc. In Figure 1, the two possible FMM plots are arranged via the grid.arrange() function of the gridExtra package (Auguie 2017).

> library("RColorBrewer")
> library("ggplot2")
> library("gridExtra")
> # Plot the fitted FMM model
> titleText <- "Simulation of four restricted FMM waves"
> defaultrFMM2 <- plotFMM(fit.rfmm, use_ggplot2 = TRUE, textExtra = titleText) +
+                 theme(plot.margin=unit(c(1,0.25,1.3,1), "cm")) +
+                 ylim(-5, 6)
> comprFMM2 <- plotFMM(fit.rfmm, components=TRUE, use_ggplot2 = TRUE, 
+                      textExtra = titleText) +
+              theme(plot.margin=unit(c(1,0.25,0,1), "cm")) +
+              ylim(-5, 6) +
+              scale_color_manual(values = brewer.pal("Set1",n = 8)[3:6])
> grid.arrange(defaultrFMM2, comprFMM2, nrow = 1)
graphic without alt text
Figure 1: Graphical representation of the estimated restricted FMM\(_4\) signal with \(\beta_1 = \beta_2, \omega_1 = \omega_2\) and \(\beta_3 = \beta_4, \omega_3 = \omega_4\) constraints. A scatter plot of the simulated data along with the fitted signal is displayed on the left (default plot). The component plot is shown on the right.

5 Real data analysis using the FMM package

This section illustrates the use of the FMM package on the analysis of real signals from chronobiology, electrocardiography and neuroscience. To do this, the package includes four real-world data sets in RData format which are described in the following sections.

Example 1: Chronobiology

Chronobiology studies ubiquitous daily variations found in nature and in many aspects of the physiology of human beings, such as blood pressure or hormone levels (Mermet et al. 2017). These phenomena commonly display signals with oscillatory patterns that repeat every \(24\) hours, usually known as circadian rhythms. In particular, circadian gene expression data have been deeply analyzed in the literature as they regulate the vast majority of molecular rhythms involved in diverse biochemical and cellular functions, see among others (Zhang et al. 2014), (Cornelissen 2014) and (Larriba et al. 2020).

The FMM package includes a data set called mouseGeneExp that contains expression data of the Iqgap2 gene from mouse liver. The liver circadian database is widely extended in chronobiology since the liver is a highly rhythmic organ with moderate levels of noise (Anafi et al. 2017; Larriba et al. 2018, 2020). The complete database is freely available at NCBI GEO (http://www.ncbi.nlm.nih.gov/geo/), with GEO accession number GSE11923. Gene expression values are given along \(48\) hours with a sampling frequency of \(1\) hour/\(2\) days. Hence, data are collected along two periods, and an FMM\(_1\) model is fitted to the Iqgap2 average expressed values as follows:

> data("mouseGeneExp", package = "FMM")
> fitGene <- fitFMM(vData = mouseGeneExp, nPeriods = 2, nback = 1, showProgress = FALSE)
> summary(fitGene)

Title:
FMM model with 1 components

Coefficients:
M (Intercept): 10.1508
                  A  alpha   beta  omega
FMM wave 1:  0.4683 3.0839 1.5329 0.0816

Peak and trough times and signals:
             t.Upper Z.Upper t.Lower Z.Lower
FMM wave 1:   0.1115 10.6191  6.0686  9.6825

Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-9.751e-02 -3.490e-02  2.269e-03 -1.530e-06  2.670e-02  1.890e-01 

R-squared:
[1] 0.8752

The behavior of the FMM versus COS model to describe this asymmetric pattern has been compared in terms of \(R^2\). The FMM model clearly outperforms the COS one with an \(R^2\) of \(0.8752\) and \(0.2835\), respectively. In addition, a difference of \(4.73\) hours in peak time estimation between both models is observed, the FMM peak estimate being much more reliable, as is shown in Figure 2.

graphic without alt text
Figure 2: Iqgap2 gene expression data along two periods (grey dots); FMM (red line) and COS (blue line) fitted signals.

Example 2: Electrocardiography

ECG records the periodic electrical activity of the heart. This activity represents the contraction and relaxation of the atria and ventricle, processes related to the crests and troughs of the ECG waveform. Heartbeats are decomposed into five fundamental waves, labelled as \(P\), \(Q\), \(R\), \(S\) and \(T\), corresponding to the different phases of the heart’s electric activity. The main features used in medical practice for cardiovascular pathology diagnosis are related to the location and amplitudes of these waves, and, of them, those labeled as \(P\), \(R\) and \(T\) are of particular interest (Bayes de Luna 2007). Standard ECG signals are registered using twelve leads, calculated from different electrode locations. Lead II is the reference signal, as it usually provides a good view of the main ECG waves (Meek and Morris 2002).

The FMM package includes the analysis of a typical ECG heartbeat from the QT database (Laguna et al. 1997). This recording, from the subject sel100, belongs to the Normal category, regarding Physionet’s pathology classification (Goldberger et al. 2000). The data illustrate the voltage of the heart’s electric activity, measured in \(mV\), along the heartbeat with a sampling frequency of \(250Hz\). Specifically, the ECG signal from lead II in the fifth of the thirty annotated heartbeats is analysed. Recordings are publicly available on (http://www.physionet.org). Data are saved as ecgData in the package. For an ECG heartbeat, an FMM\(_{ecg}\), a fifth order multicomponent FMM model can be fitted with the instruction:

> data("ecgData", package = "FMM")
> fitEcg <- fitFMM(ecgData, nback = 5, showProgress = FALSE)
> summary(fitEcg)

Title:
FMM model with 5 components

Coefficients:
M (Intercept): 5.2717
                  A  alpha   beta  omega
FMM wave 1:  0.6454 5.5151 3.2926 0.0325
FMM wave 2:  0.0994 4.4203 3.7702 0.1356
FMM wave 3:  0.2443 5.3511 0.6636 0.0323
FMM wave 4:  0.3157 5.5919 4.8651 0.0126
FMM wave 5:  0.0666 1.7988 2.1277 0.1632

Peak and trough times and signals:
             t.Upper Z.Upper t.Lower Z.Lower
FMM wave 1:   2.3686  6.2370  3.1841  4.7241
FMM wave 2:   1.1905  4.9487  2.0693  4.6897
FMM wave 3:   2.3965  6.0828  2.1872  4.5551
FMM wave 4:   2.4210  5.7933  2.4719  4.7175
FMM wave 5:   5.1212  4.8646  4.3689  4.7228

Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.0690885 -0.0095597 -0.0001127  0.0000000  0.0098533  0.0623569 

R-squared:
Wave 1 Wave 2 Wave 3 Wave 4 Wave 5  Total 
0.7645 0.0920 0.0581 0.0493 0.0278 0.9918 

It is worth noting that the FMM package not only provides ECG signal-fitting (the left hand panel in Figure 3), but it also does wave decomposition and fiducial mark annotations on the desired waves (the right hand panel in Figure 3). It is clearly visible how the specific shapes of the five main waves contribute to drawing and explaining the lead II ECG waveform from the Normal morphology. See (Rueda et al. 2021b) for a complete review of FMM\(_{ecg}\).

graphic without alt text
Figure 3: FMM\(_{ecg}\) performance on a single beat from patient sel100 from the QT database. Left: Data (grey dots) and FMM fitting (red line). Black dots locate the \(P\), \(R\) and \(T\) fiducial marks. Right: ECG decomposition on \(P\)(orange), \(Q\) (purple), \(R\) (green), \(S\) (yellow) and \(T\) (blue) waves. Dash lines indicate \(P\), \(R\) and \(T\) peak times.

Example 3: Neuroscience

Single AP curve

The study of the electrophysiological activity of neurons is one of the main research branches in neuroscience. The AP curves are oscillatory signals that serve as basic information units between neurons. They measure the electrical potential difference between inside and outside the cell due to an external stimulus. (Gerstner et al. 2014) can serve as a basic reference for electrophysiological neuroscience. Recently, the shape and other features of the AP have been used in problems such as spike sorting (Caro-Martı́n et al. 2018; Souza et al. 2019; Rácz et al. 2020) or neuronal cell type classification (Teeter et al. 2018; Gouwens et al. 2019; Mosher et al. 2020; Rodrı́guez-Collado and Rueda 2021b).

The package includes an example of a neuronal AP. The data were simulated with the renowned Hodgkin-Huxley model, first presented in (Hodgkin and Huxley 1952), which is defined as a system of ordinary differential equations and has been used in a wide array of applications, as it successfully describes the neuronal activity in various organisms. The simulation has been done using a modified version of the python package NeuroDynex available at (Gerstner et al. 2014). More concretely, a short square stimulus of \(12 \mu A\) has been applied to the neuron. The data can be accurately fitted by an FMM\(_2\) model as follows:

> data("neuronalSpike", package = "FMM")
> fitSingleAP <- fitFMM(neuronalSpike, nback = 2, showProgress = FALSE)
> summary(fitSingleAP)

Title:
FMM model with 2 components

Coefficients:
M (Intercept): 44.9474
                   A  alpha   beta  omega
FMM wave 1:  52.9014 4.4160 3.0606 0.0413
FMM wave 2:  18.5046 4.6564 4.9621 0.0322

Peak and trough times and signals:
             t.Upper  Z.Upper t.Lower  Z.Lower
FMM wave 1:   1.2777 110.8361  5.9669  -2.5002
FMM wave 2:   1.4319  36.9084  1.5649 -16.2572

Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-14.3012  -1.0038   0.7472   0.0000   1.3230  24.8618 

R-squared:
Wave 1 Wave 2  Total 
0.9064 0.0604 0.9669