Group Method of Data Handling (GMDH)-type neural network algorithms are the heuristic self organization method for the modelling of complex systems. GMDH algorithms are utilized for a variety of purposes, examples include identification of physical laws, the extrapolation of physical fields, pattern recognition, clustering, the approximation of multidimensional processes, forecasting without models, etc. In this study, the R package *GMDH* is presented to make short term forecasting through GMDH-type neural network algorithms. The *GMDH* package has options to use different transfer functions (sigmoid, radial basis, polynomial, and tangent functions) simultaneously or separately. Data on cancer death rate of Pennsylvania from 1930 to 2000 are used to illustrate the features of the *GMDH* package. The results based on ARIMA models and exponential smoothing methods are included for comparison.

Time series data are ordered successive observations which are measured in equally or unequally spaced time. Time series data may include dependency among successive observations. Hence, the order of the data is important. Time series data appear in various areas and disciplines such as medical studies, economics, the energy industry, agriculture, meteorology, and so on. Modelling time series data utilizes the history of the data and makes forecasting using this history. At times, statistical models are not sufficient to solve some problems. Examples include pattern recognition, forecasting, identification, etc. Extracting the information from the measurements has advantages while modelling complex systems when there is not enough prior information and/or no theory is defined to model the complex systems. Selecting a model automatically is a powerful way for the researchers who are interested in the result and do not have sufficient statistical knowledge and sufficient time (Mueller et al. 1998) for an analysis.

The objective of this study is to develop an R package for forecasting
of time series data. Some of recent softwares developed for time series
are *glarma*,
*ftsa*,
*MARSS*,
*ensembleBMA*,
*ProbForecastGOP*,
and *forecast*
(Hyndman and Khandakar 2008; Fraley et al. 2011; Holmes, E. J. Ward, and K. Wills 2012; Shang 2013; Dunsmuir and D. J. Scott 2015).
In this study, we focused on the development of an R package for short
term forecasting via Group Method of Data Handling (GMDH) algorithms.
The history of GMDH-type neural network is based on works from the end
of the 1960s and the beginning of the 1970s. First,
(Ivakhnenko 1966) introduced a polynomial, which is the basic
algorithm of GMDH, to construct higher order polynomials. Also,
(Ivakhnenko 1970) introduced heuristic self-organization
methods which constructed the main working system of GMDH algorithm.
Heuristic self-organization method defines the way that the algorithm
evolves, following rules such as external criteria. The GMDH method,
convenient for complex and unstructured systems, has benefits over high
order regression (Farlow 1981).

(Kondo 1998) proposed GMDH-type neural network in which the algorithm works according to the heuristic self-organization method. (Kondo and Ueno 2006a,b) proposed a GMDH algorithm which has a feedback loop. According to this algorithm, the output obtained from the last layer is set as a new input variable, provided a threshold is not satisfied in the previous layer. The system of the algorithm is organized by a heuristic self-organization method where a sigmoid transfer function is integrated. (Kondo and Ueno 2007) proposed a logistic GMDH-type neural network. The difference from a conventional GMDH algorithm was that the new one would take linear functions of all inputs at the last layer. (Kondo and Ueno 2012) included three transfer functions (sigmoid, radial basis and polynomial functions) in the feedback GMDH algorithm. (Srinivasan 2008) used a GMDH-type neural network and traditional time series models to forecast predicted energy demand. It was shown that a GMDH-type neural network was superior in forecasting energy demand compared to traditional time series models with respect to mean absolute percentage error (MAPE). In another study, (Xu et al. 2012) applied a GMDH algorithm and ARIMA models to forecast the daily power load. According to their results, GMDH-based results were superior to the results of ARIMA models in terms of MAPE for forecasting performance.

There are some difficulties when applying a GMDH-type neural network.
For example, there is no freely available software for researchers
implementing the GMDH algorithms in the literature. We present the R
package *GMDH* to make short term forecasting through GMDH-type neural
network algorithms. The package includes two types of GMDH structures;
namely, GMDH structure and revised GMDH (RGMDH) structure. Also, it
includes a variety of options to use different transfer functions
(sigmoid, radial basis, polynomial, and tangent functions)
simultaneously or separately. Data on the cancer death rate of
Pennsylvania from 1930 to 2000 are used to illustrate the implementation
of GMDH package. We compare the results to those based on ARIMA models
and exponential smoothing (ES) methods.

In this section, data preparation, two types of GMDH-type neural network structures, and estimation of a regularization parameter in regularized least square estimation (RLSE) are given.

Data preparation has an important role in GMDH-type neural network algorithms. To get rid of very big numbers in calculations and to be able to use all transfer functions in the algorithm, it is necessary for range of the data to be in the interval of (0, 1). If \(\alpha_t\) is the actual time series dataset at hand, this necessity is guaranteed by the following transformation,

\[\label{eq:w_t}
w_t=\frac{\alpha_t+\delta_1}{\delta_2} \tag{1}\]

with

\[\label{eq:delta_1} \delta_1 = \begin{cases} |\alpha_t|+1 & \text{if } \text{min}(\alpha_t) \leq 0 \\ \hfil 0 & \text{if } \text{min}(\alpha_t) > 0 \end{cases} \tag{2}\]

and

\[\label{eq:delta_2}
\delta_2 = \text{max}(\alpha_t+\delta_1) +1. \tag{3}\]

During the estimation and forecasting process in GMDH-type neural
network algorithms, all calculations are done using the scaled data set,
\(w_t\). After all processes are ended–i.e, all predictions and forecasts
are obtained–we apply the inverse transformation as follows,

\[\label{eq:alpha_t} \widehat{\alpha}_t = \widehat{w}_t\times\delta_2-\delta_1. \tag{4}\]

Let’s assume a time series dataset for \(t\) time points, and \(p\) inputs. An illustration of time series data structure in GMDH algorithms is presented in Table 1. Since we construct the model for the data with time lags, the number of observations, presented under the subject column in the table, is equal to \(t-p\); and the number of inputs, lagged time series, is \(p\). In this table, the variable called \(z\) is put in the models as a response variable, and the rest of the variables are taken into models as lagged time series \(x_i\), where \(i = 1, 2, ..., p\). The notations in Table 1 are followed throughout this paper.

Subject | \(z\) | \(x_1\) | \(x_2\) | \(\ldots\) | \(x_p\) |
---|---|---|---|---|---|

1 | \(w_t\) | \(w_{t-1}\) | \(w_{t-2}\) | \(\ldots\) | \(w_{t-p}\) |

2 | \(w_{t-1}\) | \(w_{t-2}\) | \(w_{t-3}\) | \(\ldots\) | \(w_{t-p-1}\) |

3 | \(w_{t-2}\) | \(w_{t-3}\) | \(w_{t-4}\) | \(\ldots\) | \(w_{t-p-2}\) |

\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) |

\(t-p\) | \(w_{p+1}\) | \(w_{p}\) | \(w_{p-1}\) | \(\ldots\) | \(w_{1}\) |

A better model which explains the relation between response and lagged time series is captured via transfer functions. The sigmoid, radial basis, polynomial, and tangent functions, presented in Table 2, are mainly used to explain the relation between inputs and output in GMDH-type neural network algorithms (Kondo and Ueno 2012). We use all transfer functions, stated in Table 2, simultaneously in each neuron. In other words, we construct four models at each neuron, and then the model which gives the smallest prediction mean square error (PMSE) is selected as the current transfer function at the corresponding neuron.

Sigmoid Function | \(z=1/(1+e^{-y})\) |

Radial Basis Function | \(z=e^{-y^2}\) |

Polynomial Function | \(z=y\) |

Tangent Function | \(z=\tan(y)\) |

GMDH-type neural network algorithms are modeling techniques which learn the relations among the variables. In the perspective of time series, the algorithm learns the relationship among the lags. After learning the relations, it automatically selects the way to follow in algorithm. First, GMDH was used by (Ivakhnenko 1966) to construct a high order polynomial. The following equation is known as the Ivakhnenko polynomial given by

\[\label{eq:y} y=a+\sum\limits_{i=1}^m b_i\cdot x_i + \sum\limits_{i=1}^m\sum\limits_{j=1}^m c_{ij}\cdot x_i\cdot x_j + \sum\limits_{i=1}^m\sum\limits_{j=1}^m\sum\limits_{k=1}^m d_{ijk}\cdot x_i\cdot x_j\cdot x_k + \dots \tag{5}\]

where \(m\) is the number of variables and \(a,b,c,d, \dots\) are coeffients of variables in the polynomial, also named as weights. Here, y is a response variable, \(x_i\) and \(x_j\) are the lagged time series to be regressed. In general, the terms are used in calculation up to square terms as presented below,

\[\label{eq:y2} y=a+\sum\limits_{i=1}^m b_i\cdot x_i + \sum\limits_{i=1}^m\sum\limits_{j=1}^m c_{ij}\cdot x_i\cdot x_j \tag{6}\]

The GMDH algorithm considers all pairwise combinations of \(p\) lagged time series. Therefore, each combination enters each neuron. Using these two inputs, a model is constructed to estimate the desired output. In other words, two input variables go in a neuron, one result goes out as an output. The structure of the model is specified by Ivakhnenko polynomial in equation (6) where \(m=2\). This specification requires that six coefficients in each model are to be estimated.

The GMDH algorithm is a system of layers in which there exist neurons. The number of neurons in a layer is defined by the number of input variables. To illustrate, assume that the number of input variables is equal to \(p\), since we include all pairwise combinations of input variables, the number of neurons is equal to \(h=\binom{p}{2}\). The architecture of GMDH algorithm is illustrated in Figure 1 when there are three layers and four inputs.

In the GMDH architecture shown in Figure 1, since the number of inputs is equal to four, the number of nodes in a layer is determined to be six. This is just a starting layer to the algorithm. The coefficients of equation (6) are estimated in each neuron. By using the estimated coefficients and input variables in each neuron, the desired output is predicted. According to a chosen external criteria, \(p\) neurons are selected and \(h-p\) neurons are eliminated from the network. In this study, prediction mean square error (PMSE) is used as the external criteria. In Figure 1, four neurons are selected while two neurons are eliminated from the network. The outputs obtained from selected neurons become the inputs for the next layer. This process continues until the last layer. At the last layer, only one neuron is selected. The obtained output from the last layer is the predicted value for the time series at hand. The flowchart of the algorithm is depicted in Figure 2.

In a GMDH algorithm, there exist six coefficients to be estimated in each model. Coefficients are estimated via RLSE.

A GMDH-type neural network constructs the algorithm by investigating the relation between two inputs and the desired output. Architecture of a revised GMDH (RGMDH)-type neural network does not only consider this relation, but it also considers the individual effects on the desired output (Kondo and Ueno 2006b). There are two different types of neurons in an RGMDH-type neural network. In the first type of neuron, it is same as in GMDH-type neural network, given as in equation (6). That is, two inputs enter the neuron, one output goes out. In the second type of neuron, \(r\) inputs enter the neuron, one output goes out. This second type neuron is given by

\[\label{eq:y_second}
y=a+\sum\limits_{i=1}^r b_i\cdot x_i \quad , \quad r \leq p, \tag{7}\]

where \(r\) is the number of inputs in the corresponding second type
neuron.

As mentioned above, there exist \(h=\binom{p}{2}\) neurons in one layer in a GMDH-type neural network. In addition to this, with the \(p\) neurons from the second type of neuron, the number of neurons in one layer becomes \(\eta=\binom{p}{2}+p\) in an RGMDH-type algorithm. The architecture of an RGMDH-type algorithm is shown in Figure 3 for the case when there are three layers and three inputs. In this architecture, since the number of inputs is three, the number of nodes in a layer is determined to be six. Here, three of six nodes are the first type of neurons in which all pairwise combinations of lagged time series are already used as in the GMDH algorithm. The rest of the three nodes are the second type of neurons where the individual effects of the lagged time series are sequentially added to the layer starting from lag 1. In each neuron, coefficients of models are calculated by using the corresponding models in equations (6) and (7). For instance, in Figure 3, there are six coefficients to be estimated as given by equation (6) for the first type of neurons, and two, three and four coefficients are estimated as given in equation (7) for the the second type of neurons when \(r\) equals to 1, 2 and 3, respectively. The desired output is predicted by utilizing estimated coefficients and input variables in each neuron. Here, \(p\) neurons are selected as living cells and \(\eta-p\) death cells are eliminated from the network according to the external criteria. The rest of the algorithm is same with GMDH.