This article proposes a new generalization of the Multivariate Markov Chains (MMC) model. The future values of a Markov chain commonly depend on only the past values of the chain in an autoregressive fashion. The generalization proposed in this work also considers exogenous variables that can be deterministic or stochastic. Furthermore, the effects of the MMC’s past values and the effects of pre-determined or exogenous covariates are considered in our model by considering a non-homogeneous Markov chain. The Monte Carlo simulation study findings showed that our model consistently detected a non-homogeneous Markov chain. Besides, an empirical illustration demonstrated the relevance of this new model by estimating probability transition matrices over the space state of the exogenous variable. An additional and practical contribution of this work is the development of a novel R package with this generalization.
Multivariate Markov chains (MMC) have a wide range of applications, in various fields. Hence, several studies and generalizations of the MMC models have been made. However, the availability of packages that allow the estimation and application of these models are scarce, and most of these methods use algorithms and software that are not broadly available or can only be applied in particular situations. In the last few years, R software has been gaining importance in the field of statistical computing. This phenomenon might be because it is free and open-source software, which compiles and runs on a wide variety of operating systems. Specifically, in R software, there are some available packages related to Markov chains (MC) and MMC. For example, the march package (Berchtold et al. 2020; Maitre and Emery 2020) allows the computation of various Markovian models for categorical data, including homogeneous Markov chains of any order, MTD models, Hidden Markov models, and Double Chain Markov Models. Ogier Maitre developed this package with contributions from Andre Berchtold, Kevin Emery, Oliver Buschor, and Andre Berchtold maintains it. All the models computed by this package are for univariate categorical data. The markovchain package (Spedicato 2017) contains functions and methods to create and manage discrete-time Markov chains. In addition, it includes functions to perform statistical and probabilistic analysis (analysis of their structural proprieties). Finally, the DTMCPack package (Nicholson 2013) contains a series of functions that aid in both simulating and determining the properties of finite, discrete-time, discrete-state Markov chains. There are two main functions: DTMC
and MultDTMC
, which produce \(n\) iterations of a Markov Chain(s) based on transition probabilities and an initial distribution given by the user, for the univariate and multivariate case, respectively. This last package is the only one available in R for MMC. In general, the work on MMC models is mostly based on improving the estimation methods and/or making the model more parsimonious. In this work, we aim to develop a new generalization that considers exogenous variables. Specifically, the effects of the MMC’s past values and the effects of pre-determined or exogenous covariates are considered in our model by considering a non-homogeneous Markov chain. Additionally, we address statistical inference and implement these methods in an R package. The R package includes three functions: multimtd
, multimtd_probit
and mmcx
. The first two functions estimate the MTD model for multivariate categorical data, with Chings’s specification (Ching et al. 2002) and with the Probit specification (Nicolau 2014), respectively. The last function allows the estimation of our proposed model, the Generalized Multivariate Markov Chain (GMMC) model. The R package, GenMarkov, with these three functions is available in the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=GenMarkov.
Markov chains can be appropriate for representing dependencies between successive observations of a random variable. However, when the order of the chain or the number of possible values increases, Markov chains have lack parsimony. In this context, Jacobs and Lewis (1978), Pegram (1980) and Logan (1981) proposed several models for HOMC. Notwithstanding these developments, the Mixture Transition Distribution model (Raftery 1985) proved to be more suitable to model HOMC, which overshadowed the previously proposed models. Several relevant extensions of the MTD model emerged: the Multimatrix MTD (Berchtold 1995, 1996), which allowed modeling the MTD by using a different \(m \times m\) transition matrix for each lag, the Infinite-Lag MTD model that assumes an infinite lag order (\(l = \infty\)), which was first considered by Mehran (1989) and later developed by Le et al. (1996) in a more general context. Finally, the MTD with General State Spaces allowed modeling more general processes with an arbitrary space state (Martin and Raftery 1987; Adke and Deshmukh 1988; Wong and Li 2001). Although the MTD model presents a more parsimonious approach to model Markov chains with order higher than one, it has weaknesses. Namely, when considering more than one data sequence, one represents the MMC as a HOMC, by expanding the state-space. This approach could result in a more complex probability transition matrix. Consequently, this can make the estimation unfeasible as the order, states, and the number of data sequences increase. Additionally, the model assumes the same transition matrix for each lag. In this setting, Ching et al. (2002) determined an alternative to handle the unfeasibility of the conventional multivariate Markov chain (MMC) by proposing a model with fewer parameters. The model developed is essentially the same as the MTD. However, it considers a different \(m \times m\) transition matrix for each lag and considers more than one data sequence. In the proposed multivariate Markov chain model, Ching et al. (2002) assume the following relationship:
Let \(x_t^{(j)}\) be the state vector of the \(j\)th sequence at time \(t\). If the \(j\)th sequence is in state \(l\) at time \(t\) then
\[\begin{equation} x_{t+1}^{(j)} = \sum_{k=1}^s \lambda_{jk}P^{(jk)}x_{t}^{(k)}, \text{for } j =1, 2, \dots, s \tag{1} \end{equation}\] where \(0 \leq \lambda_{jk} \leq 1\) for \(j \leq s, k \leq s\) and \(\sum_{k=1}^s \lambda_{jk} =1\) for \(j=1, 2, \dots, s\). The \(\lambda_{jk}\) can be interpreted as the mixing probability of the \(j\)th state to the \(k\)th state.
The state probability distribution of the \(k\)th sequence at time \((t + 1)\) depends on the weighted average of \(P^{(jk)}x_{t}^{(k)}\) . Here \(P^{(jk)}\) is a transition probability matrix from the states in the \(k\)th sequence to the states in the \(j\)th sequence and \(x_t^{(k)}\) is the state probability distribution of the \(k\)th sequences at time \(t\). In matrix form:
\[\begin{equation} \underline{x}_{t+1}^{(j)} \equiv \left[ \begin{array}{c} x_{t+1}^{(1)} \\ \vdots \\ x_{t+1}^{(s)} \end{array} \right ] = \left[ \begin{array}{ccc} \lambda_{11}P^{(11)} & \dots & \lambda_{1s}P^{(1s)}\\ \vdots & \ddots & \vdots\\ \lambda_{s1}P^{(s1)}& \dots & \lambda_{ss}P^{(ss)} \end{array} \right ] \left[ \begin{array}{c} x_{t}^{(1)} \\ \vdots \\ x_{t}^{(s)} \end{array} \right ] \equiv Q \underline{x}_{t} \tag{2} \end{equation}\] where \(Q\) is an \(ms \times ms\) block matrix (\(s \times s\) blocks of \(m \times m\) matrices) and \(x_t\) is a stacked \(ms\) column vector (\(s\) vectors, each one with \(m\) rows).
The matrices \(P^{(jk)}\) can be estimated for each data sequence by counting the transition frequency from the states in the \(k\)th sequence to those in the \(j\)th sequence, obtaining the transition frequency matrix for the data sequence. After normalization, the estimates of the transition probability matrices, i.e., \(\widehat{P}^{(jk)}\), are obtained. Regarding the \(\lambda_{jk}\) coefficients, the estimation method proposed by Ching et al. (2002) involves the following optimization problem:
\[\begin{equation} min_{\lambda} max_{i} \vert [ \sum_{k=1}^m \lambda_{jk} \widehat{P}^{(jk)} \widehat{\boldsymbol{x}}^{(k)} - \widehat{\boldsymbol{x}}^{(j)} ] \vert \tag{3} \end{equation}\]
\[ \text{s.t. } \sum_{k=1}^s \lambda_{jk} \text{ and } \lambda_{jk} \geq 0 \] Besides this, different models have been proposed for multiple categorical data sequences. Kijima et al. (2002) proposed a parsimonious MMC model to simulate correlated credit risks. Siu et al. (2005) proposed an easy to implement model; however, its applicability was limited by the number of parameters involved. Ching et al. (2008) proposed a simplified model based on an assumption proposed in Zhang et al. (2006). Zhu and Ching (2010) proposed a method of estimation based on minimizing the prediction error with equality and inequality restrictions and Nicolau and Riedlinger (2014) proposed a new approach to estimate MMC which avoids imposing restrictions on the parameters, based on non-linear least squares estimation, facilitating the model estimation and the statistical inference. Berchtold (2003) proposed a MTD model for heteroscedastic time series. Lastly, Wang et al. (2014) proposed a new multivariate Markov chain model to reduce the number of parameters. Thus, generally, the models used in the published papers were developed by Ching et al. (2002) or were a consequent generalization of them and addressed the MMC as an end in itself. In Damásio (2013) and Damásio and Nicolau (2014), a different and innovative concept was proposed: the usage of MMC as regressors in a certain model. Hence, given that the MMC Granger causes a specific dependent variable, and taking advantage of the information about the past state interactions between the MMC categories, it was possible to forecast the current dependent variable more accurately. Other relevant contributions are related to the optimization algorithm, as in Lèbre and Bourguignon (2008) and Chen and Lio (2009), and to empirical applications (Ching et al. 2003; Ching and Ng 2006; Damásio 2018; Damásio and Mendonça 2019, 2020). Also, Damásio and Nicolau (2020) proposed a new methodology for detecting and testing the presence multiple structural breaks in a Markov chain occurring at unknown dates. In the vast majority of MMC models’ studies, a positive correlation between the different data sequences is assumed due to the restrictions imposed. This aspect means it is always considered that at moment \(t\), an increase in a state probability for a data sequence has an increasing impact on another data sequence, for time \(t+1\). Thereupon, if one has a negative correlation between series, the parameter estimates are forced to be zero. The solution to this problem is very straightforward; one can relax the assumptions and not assume the constraints. However, that means the results produced by the model will no longer be probabilities. Raftery and Tavaré (1994) presented an alternative, by dropping the positivity condition and imposing another set of restrictions. Ching et al. (2008) also tackled this issue and proposed a method where one splits the \(Q\) matrix into the sum of two other matrices and one represents the positive correlations and another the negative correlations. Also, in Nicolau (2014), a specification completely free from constraints, inspired by the MTD model, was proposed, facilitating the estimation procedure and, at the same time, providing a more accurate specification for \(P_j(i_0 | i_1, \dots, i_s)\). The model was:
\[\begin{equation} P_j(i_0 | i_1, \dots, i_s) = P_j^{\Phi}(i_0 | i_1, \dots, i_s) := \\ \frac{\Phi(\eta_{j0} + \eta_{j1}P(i_0|i_1) + \dots + \eta_{js}P(i_0|i_s))}{\sum_{k=1}^m \Phi(\eta_{j0} + \eta_{j1}P(k|i_1) + \dots + \eta_{js}P(k|i_s))} \tag{4} \end{equation}\] where \(n_{ji} \in \mathbb{R}(j = 1, \dots, s; i = 1, \dots, m)\) and \(\Phi\) is the (cumulative) standard normal distribution function.
This specification is denoted as and MTD-Probit model. The log-likelihood is given by: \[\begin{equation} LL = \sum_{i_1, i_2, \dots, i_{i_s}, i_0} n_{i_1, i_2, \dots, i_{i_s}, i_0} log(P_j^{\Phi}(i_0 | i_1, \dots, i_s) ) \tag{5} \end{equation}\] and the maximum likelihood estimator is defined, as usual, as \(\widehat{\eta} = \text{arg max}_{n_{j1}, \dots, n_{js}} LL\). The parameters \(P_{jk}(i_0|i_1)\), \(k\) =\(1, \dots, s\) can be estimated in advance, through the consistent and unbiased estimators proposed by Ching et al. (2002):
\[\begin{equation} \widehat{P}_{jk}(i_0|i_1) = \frac{n_{i_1i_0}}{\sum_{i_0=1}^n n_{i_1 i_0}} \tag{6} \end{equation}\] This specification can be superior to the MTD because the estimation procedure is easier, and the standard numerical optimization routines can be easily applied in the absence of constraints. However, similarly to the standard MTD, the likelihood is not a strictly concave function on the entire parameter state-space, thus the choice of starting values is still important. Additionally, the model describes a broader range of possible dependencies since the parameters are not constrained. Moreover, this proposed model is more accurate than the MTD model. For more details on this, see Nicolau (2014).
Overall, the published work on MMC models was mostly based on improving the estimation methods and/or making the model more parsimonious. In Damásio (2013) and Damásio and Nicolau (2014), a different approach was used, and the work developed focused on the usage of MMC as regressors in a certain model. Notably, it showed that an MMC can improve the forecast of a dependent variable. In a way, it demonstrated that an MMC can be an end in itself, but it can be an instrument to reach an end or a purpose. In this work, the opposite will be developed: instead of considering an MMC as regressors, a model in which a vector with pre-determined exogenous variables is part of \(\mathcal{F}_{t-1}\) is proposed.
Regarding the inclusion of covariates in Markov chains models, Regier (1968) proposed a two-state Markov chain model, where the transition matrix probabilities were a function of a parameter, \(q\), that described the tendency of the subject to move from state to state. Kalbfleisch and Lawless (1985) proposed a panel data analysis method under a continuous-time Markov model that could be generalized to handle covariate analysis and the fitting of certain non-homogeneous models. This work overcame the limitations of Bartholomew (1968), Spilerman and Singer (1976) and Wasserman (1980) methodologies, by developing a new algorithm that provided a very efficient way of obtaining maximum likelihood estimates. Also, Muenz and Rubinstein (1985) developed a Markov model for covariates dependence of binary sequences, where the transitions probabilities were estimated through two logistic regressions that depended on a set of covariates. Essentially, Muenz and Rubinstein (1985) modeled a non-homogeneous Markov chain through logistic regression, considering only two states. Islam et al. (2004) developed an extension of this model considering three states, and Islam and Chowdhury (2006) generalized this approach for HOMC. Additionally, Azzalini (1994) proposed a model to study the influence of time-dependent covariates on the marginal distribution of a binary response in serially correlated binary data, where Markov chains are expressed in terms of transitional probabilities. Jackson (2011) proposed a Markov model for panel data, which allowed for the transitions intensities to vary between individuals or constant time-dependent covariates. Specifically, this work allowed to account for different intensities throughout transitions of states and include individual-specific covariates. The time-inhomogeneos model proposed is restricted to piecewise-constant intensities. The implementation of this work is available in the package msm. More recently, Bolano (2020) proposed an MTD-based approach to handle categorical covariates, that considers each covariate separately and combines the effects of the lags of the MTD and the covariates employing a mixture model. Specifically, the model is given by:
\[\begin{equation} P(X_t = k \mid X_{t-1} = i, C_1 = c_1, \dots, C_l = c_l) \approx \theta_0 a_{ik} + \sum_{h=1}^l \theta_h d_{c_{h}k} \tag{7} \end{equation}\]
where \(a_{ik}\) is the transition probability from state \(i\) to state \(k\), as in a conventional Markov chains and \(d_{c_{h}k}\) is the probability of observing the states \(k\) given the modality \(c_h\) of the covariate \(h\). Lastly, \(\theta_0, \dots, \theta_l\) are the weights of the explanatory elements of the model.
According to the literature presented, several researchers have proposed methodologies or generalizations to include covariates in Markov chain models. Primarily for social sciences and health applications, where the transition probabilities were generally modeled through logistic regression. However, there has been an increased focus on categorical covariates, opposing continuous covariates and a lack of approaches to multivariate Markov chain models. Thus, with this work, we aim to tackle this research gap.
In this work, a new generalization of Ching et al. (2002) MMC model is presented: the GMMC model, that is, we will consider exogeneous or pre-determined covariates in the \(\sigma\) - algebra generated by the available information until \(t-1\) (\(\mathcal{F}_{t-1}\)). These variables can be deterministic or stochastic and do not necessarily need to be reported at time \(t\). Broadly, the model is given by:
\[\begin{equation} P(S_{jt} = k | \mathcal{ F}_{t-1} ) = P(S_{jt} = k | S_{1t-1} = i_1, S_{2t-1} = i_2, \dots, S_{st-1} = i_s, \boldsymbol{x}_t) \tag{8} \end{equation}\] We can specify this model as proposed by Ching et al. (2002) with Raftery’s notation:
\[\begin{multline} P(S_{jt} = i_0 | S_{1t-1} = i_1,\dots, S_{st-1} = i_s, \boldsymbol{x}_t) \equiv \\ \lambda_{j1}P(S_{jt} = i_0 | S_{1t-1} = i_1,\boldsymbol{x}_t) + \dots + \lambda_{js}P(S_{jt} = i_0 | S_{st-1} = i_s, \boldsymbol{x}_t) \tag{9} \end{multline}\] subject to the usual constraints.
This proposed model is estimated through MLE, similar to the standard MTD model. The log-likelihood is given by:
\[\begin{equation} LL = \sum_{t = 1}^n log P(S_{jt} = i_0 | S_{1t-1} = i_1, \dots, S_{st-1} = i_s, \boldsymbol{x}_t) \tag{10} \end{equation}\]
Additionally, the probabilities can be estimated through an multinomial logit model. The proof for consistency and asymptotic distribution is available in the Supplementary Material section.
A Monte Carlo simulation study was designed to evaluate the dimension and power of the test parameters of the proposed model. The R statistical environment was used for all computations. This simulation study was comprised of two parts.
First, we considered two sequences with two and three states. The main goal was to assess if the model detected the presence of a non-homogeneous Markov chain correctly and if the estimate of the parameter would correspond to the expected. So, given two sequences, one generated through a non-homogeneous Markov chain and the other generated through a homogeneous Markov chain, it would be expected that the parameter associated with the transition probabilities of the first sequence would be one and the parameter associated with the transition probabilities of the second sequence would be zero. With this in mind, the transitions probabilities of the first sequence were estimated through a logistic regression, where parameters of this regression were randomly generated in R, and the second sequence was generated through a first-order Markov chain. Hence, for both states cases considered, it was expected that the estimated regression would be:
\[\begin{multline} P(S_{1t} = i_0 | S_{1t-1} = i_1, S_{2t-1} = i_2, \boldsymbol{x}_{t-1}) = \\ 1 \times P(S_{1t} = i_0 | S_{1t-1} = i_1,\boldsymbol{x}_{t-1}) + 0 \times P(S_{1t} = i_0 | S_{2t-1} = i_2, \boldsymbol{x}_{t-1}) \tag{11} \end{multline}\]
To assess the test power and dimension, we used the Wald test with the following hypothesis:
Hypothesis | Test | |
---|---|---|
Power | \(H_0: \lambda_{11} = 0\) | \(\frac{\widehat{\lambda}_{11}^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}\) |
\(H_0: \lambda_{12} = 1\) | \(\frac{(\widehat{\lambda}_{12}-1)^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}\) | |
Dimension | \(H_0: \lambda_{11} = 1\) | \(\frac{(\widehat{\lambda}_{11}-1)^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}\) |
\(H_0: \lambda_{12} = 0\) | \(\frac{\widehat{\lambda}_{12}^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}\) |
The simulation procedure was performed as follows:
mmcx
;Considering two states, the test dimension was at 5.7% with a sample size of 100 observations, sightly increased with 500 observations, and returned to the expected values in 1000 and 5000 observations. For a sample size of 100, 500, and 1000 observations, we have low test power. So, when considering two states, the sample must have at least 5000 observations, or, if that is not possible, consider a higher significance level when testing for individual significance.
Considering three states, the test dimension was 9.7% for a sample size of 100 observations, 0.2% for a sample size of 500 observations, and 0.3% for a sample size of 1000. Regarding the test power, we see similar behavior, for a sample of 100 observations, the test power was 90.5%, and from a sample of 500 observations, we reach a test power of 100%. Thus, when considering three states, one may consider a sample of 500 observations without compromising the test power and dimension.
Secondly, we performed a simulation study where we considered two non-homogeneous Markov chain with two states. Here, the main goal was to assess if the model correctly detected the parameters assigned. So, in this case, we started by generating the terms of the model proposed. These terms were estimated through logistic regression, and the parameters of this regression were randomly generated in R. Similarly to Part I, we considered a Wald test to assess the power and dimension of the test. The simulation procedure was performed as follows:
mmcx
;The probabilities \(P\left(S_{1t}|S_{1t-1}, x_{t-1} \right)\) and \(P\left(S_{1t}|S_{2t-1}, x_{t-1}\right)\) presented some differences regarding its values’ distributions. Specifically, \(P\left(S_{1t}|S_{1t-1}, x_{t-1} \right)\) had more extreme probabilities values, with the minimum value being close to 0 and the maximum value being close to 1. And, the probabilities \(P\left(S_{1t}|S_{2t-1}, x_{t-1} \right)\) had more moderate values, with the minimum value being, on average, 0.3 and the maximum value, 0.7. When the probabilities have values close to 1, one says that the states/regimes are persistent. We calculated the power and dimension of test for each value of \(\lambda\) when the estimated probabilities are moderate and when they are extreme. Hence, considering equation 1:
\[\begin{multline} P\left(S_{1t} = i_0 | S_{1t-1} = i_1,\dots, S_{2t-1} = i_2, \boldsymbol{x}_{t-1} \right) = \\ \lambda_{11}P\left(S_{1t} = i_0 | S_{1t-1} = i_1,\boldsymbol{x}_{t-1}\right) + \lambda_{12}P\left(S_{1t} = i_0 | S_{2t-1} = i_s, \boldsymbol{x}_{t-1} \right) \tag{12} \end{multline}\]
The parameter \(\lambda_{11}\) will be associated with more extreme probabilities and \(\lambda_{12}\) will be associated with more moderate probabilities.
When the states are persistent and the parameter’s value is low (i.e., 0.2 and 0.4), we have low test power. By increasing this value, the power of test increases as well. When the states are not persistent, we do not have a clear pattern regarding the power of test, for a value of the parameter of 0.2, the power of test is still low (although not as low as the first scenario), increases when we have a value of 0.4, decreases when the value is 0.6 and increases again when the value is 0.8. Overall, the estimated standard errors seem high, leading to low test power. Regarding the test dimension, when we have a higher weight associated with the non-persistent states, the test dimension converges to 0. However, when this weight is associated with the persistent states, the test dimension increases with the sample size, reaching a value of 10% in some cases. Hence, one must use a 10% significance level to perform statistical inference on the parameters in this situation.
Regarding the software implementation for each function, for the multimtd
function the estimation method was presented in Berchtold (2001) applied to the multivariate case. For multimtd_probit
, a package for numerical maximization of the log-likelihood, maxLik (Henningsen and Toomet 2011), was used. This package performs Maximum Likelihood estimation through different optimization methods that the user can choose. The optimization methods available are Newton-Raphson, Broyden - Fletcher - Goldfarb - Shanno, BFGS al- algorithm, Berndt - Hall - Hall - Hausman, Simulated ANNealing, Conjugate Gradients, and Nelder-Mead. Finally, for the mmcx
function, a different approach was used. Unlike the MTD- Probit, the model proposed has equality and inequality restrictions in the parameters. The maxLik (Henningsen and Toomet 2011) package only allows one type of restriction for each Maximum Likelihood estimation, so it was not possible to use this package to estimate the proposed model with exogenous variables. Hence, the algorithm used was the Augmented Lagrangian method, available in the alabama (Varadhan 2015) package through the function auglag
. This estimation method for the proposed model is not very common, however, it has been applied to Markov chain models (Rajarshi 2013). The GMMC model’s probabilities were estimated through a Multinomial Logit using rmultinom
of the nnet package (Venables and Ripley 2002).
Additionally, the hessian matrices were also computed, which allowed performing statistical inference. The maxLik
and auglag
compute the Hessian matrices with the estimates. For the function multimtd
, since the optimization procedure of Berchtold (2001) was used, the hessian was computed through the second partial derivatives. The function multi.mtd
requires the following elements:
y
, a matrix of the categorical data sequences.
deltaStop
, the delta below which the optimization phases of the parameters stop.
is_constrained
, flag indicating whether the function will consider the usual set of constraints (usual set: , new set of constraints: ).
delta
, the amount of change to increase/decrease in the parameters for each iteration of the optimization algorithm.
The last three arguments concern the optimization procedure. For more details see Berchtold (2001). Considering two vectors of two categorical data sequences, s1
and s2
, to estimate the model and obtain the results:
multi.mtd(y=cbind(s1,s2), deltaStop=0.0001, is_constrained=TRUE, delta=0.1)
The function multi.mtd_probit
requires the following arguments:
y
, a matrix of the categorical data sequences.initial
, a vector of the initial values of the parameters.nummethod
, the numerical maximization method, currently either “NR” (for Newton-Raphson), “BFGS” (for Broyden-Fletcher-Goldfarb-Shanno), “BFGSR” (for the BFGS algorithm implemented in R), “BHHH” (for Berndt-Hall-Hall-Hausman), “SANN” (for Simulated ANNealing), “CG” (for Conjugate Gradients), or “NM” (for Nelder-Mead). Lower-case letters (such as “nr” for Newton-Raphson) are allowed. The default method is “BFGS”. For more details see maxLik (Henningsen and Toomet 2011) package.Considering two vectors of two categorical data sequences, s1
and s2
again, to estimate the model an obtain the results with BFGS maximization method:
Finally, the function mmcx
requires the following elements:
y
, a matrix of categorical data sequences.x
, a matrix of covariates (exogeneous variables).initial
, a vector of the initial values of the parameters.Considering two vectors of two categorical data sequences, s1
and s2
, and a vector of an exogeneous variables, x
, to estimate the model and obtain the results:
These functions return a list with the parameter estimates, standard errors, z-statistics, p- values, and the log-likelihood function value for each equation.
The package offers an additional function that allows to obtain the transition probability matrices of mmcx
considering a specific value of x
defined by the user. The function is MMC_tpm
and requires the following elements:
s
, a matrix of categorical data sequences.x
, a matrix of covariates (exogeneous variables).value
, a single value of x
, to condition the probability transition matrices.result
, a list returned by the function mmcx
containing the model’s estimates.Considering two vectors of two categorical data sequences, s1
and s2
, a vector of an exogeneous variables, x
and res
the list returned by the function mmcx
, to obtain the transition probability matrices:
The function returns an array containing the probability transition matrices, conditioned on a specific value of x
, for each equation.
Markov chain models are used in interdisciplinary areas, such as economics, business, biology, and engineering, with applications to predict long-term behavior from traffic flow to stock market movements, among others. Modeling and predicting stock markets returns is particularly relevant for investors and policy makers. Since the stock market is a volatile environment, and the returns are difficult to predict, estimating the set of probabilities that describe these movements, might provide relevant input. Additionally, incorporating the effect of key macroeconomic variables could provide a more accurate picture of this specific environment.
The following empirical illustration aims to model stock returns of two indexes as a function of the interest rate spread, specifically the 10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity.
The interest rate spread is a key macroeconomic variable and provides valuable information regarding the economy state. Specifically, it has been used to forecast recessions as in Estrella and Mishkin (1996), Dombrosky and Haubrich (1996), Chauvet and Senyuz (2016), Tian and Shen (2019) and McMillan (2021). Generically, short-term yields are lower than long-term yields when the economy is in expansion. On the other hand, short-term yields are higher than long-term yields when the economy is in recession. The difference between these yields (or, more specifically, the yield curve’s slope) can be used to forecast the state of the economy. Hence, this indicator might provide relevant input for investors.
We considered the 5-week-day daily stock returns (\(r_t=100 \times \log(P_t/P_{t-1})\), where \(P_t\) is the adjusted close price) of two indexes, S&P500 and DJIA, from November \(11^{th}\) 2011 to September \(1^{st}\) 2021 (2581 observations). Additionally, we considered the interest rate spread (\(spread_{t}\)), the 10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity. The data was retrieved from FRED. Below, we have the descriptive statistics of these variables.
Variable | Minimum | 1\(^{st}\) Quantile | Median | Mean | 3\(^{rd}\) Quantile | Maximum |
---|---|---|---|---|---|---|
\(spread_{t}\) | -0.52 | 0.92 | 1.54 | 1.454 | 2.03 | 2.97 |
\(r_{t;SP500}\) | -12.765 | -0.32 | 0.07 | 0.054 | 0.518 | 8.968 |
\(r_{t;DJIA}\) | -13.842 | -0.327 | 0.071 | 0.046 | 0.508 | 10.764 |
Moreover, to apply the model proposed, it is necessary to have a categorical time series, thus we applied the following procedure:
\[ S_{st}= \begin{cases} 1, r_t \leq \widehat{q}_{s;0.25}\\ 2, \widehat{q}_{s;0.25} < r_t < \widehat{q}_{s;0.75} \\ 3, r_t \geq \widehat{q}_{s;0.75}\\ \end{cases} \]
where \(\widehat{q}_{s;\alpha}\) is the estimated quantile of order \(\alpha\) of the marginal distribution of \(r_t\). Considering this illustration and the model proposed, we will have two equations:
\[\begin{multline} P(S_{sp500,t} | S_{sp500, t-1}, S_{djia, t-1}, spread_{t-1}) = \\ \lambda_{11} P(S_{sp500,t} | S_{sp500, t-1}, spread_{t-1}) + \lambda_{12} P(S_{sp500,t} | S_{djia, t-1}, spread_{t-1}) \tag{13} \end{multline}\]
\[\begin{multline} P(S_{djia,t} | S_{sp500, t-1}, S_{djia, t-1}, spread_{t-1}) = \\ \lambda_{21} P(S_{djia,t} | S_{sp500, t-1}, spread_{t-1}) + \lambda_{22} P(S_{djia,t} | S_{djia, t-1}, spread_{t-1}) \tag{14} \end{multline}\]
In Figures 5 to 8 generate through ggplot2 (Wickham 2016) and gridExtra (Auguie 2017), we have the smoothed conditional probabilities of both series, depending on \(spread_{t-1}\). The number of observations is high, and the probabilities varied abruptly in a small time frame, making the plots hard to read. To simplify, a moving average model (from pracma (Borchers 2022)) of order 5, due to the frequency of the data, was adjusted to these probabilities to illustrate how they evolve throughout time. These plots represent the probabilities associated with the parameters of the general model proposed, showcasing how these vary throughout time and the main of advantage of this generalization. Instead of having fixed matrices of transition probabilities, we allow for these to vary throughout time, depending on the values of \(spread_{t-1}\). Specifically, Figures 5 and 6 correspond to the non-homogeneous Markov chain to build the SP&500’s equation and Figures 7 and Figures 8 correspond to the non-homogeneous Markov chain to build DJIA’s equation. We see a similar behavior within each series regardless of whether it depends on the previous states of \(S_{1t}\) or \(S_{2t}\). Additionally, the scales of the graphs are small, indicating that these probabilities vary around the same set of values.
The model can be estimated through the mmcx
function:
--------------------------------------------
Equation 1
Estimate Std. Error t value Pr(>|t|)
1 0.685660 0.171346 4.002 0.000 ***
2 0.314340 0.171346 1.835 0.067 *
Log-Likelihood: -2636.355
--------------------------------------------
--------------------------------------------
Equation 2
Estimate Std. Error t value Pr(>|t|)
1 0.629993 0.176383 3.572 0.000 ***
2 0.370007 0.176383 2.098 0.036 **
Log-Likelihood: -2636.622
--------------------------------------------
Considering the first equation, the effect of the probabilities depending on S&P500’s previous state and the interest rate spread has a higher weight on the overall probability. Also, this estimate is highly significant, presenting a \(p\)-value close to zero. The effect of DJIA’s previous state in S&P500 is lower but it is also significant for a 10% significance level. In the second equation, the effect of S&P500’s previous state is higher than DJIA’s and both estimates are highly significant.
One of the advantages of this approach is the possibility to assess the transition probabilities for specific values of \(x_t\), in this case, the interest rate spread. For both series, we calculated the transition probabilities for this variable’s minimum and maximum value in the sample, which are -0.52 and 2.97, respectively. To obtain the probability transition matrices for these two cases, the code is the following:
library(markovchain)
plot(new('markovchain', transitionMatrix = tpm_max[,,1])) # Generate figure 9
plot(new('markovchain', transitionMatrix = tpm_min[,,1])) # Generate figure 10
plot(new('markovchain', transitionMatrix = tpm_max[,,2])) # Generate figure 11
plot(new('markovchain', transitionMatrix = tpm_min[,,2])) # Generate figure 12
In Figures 10 and 9, we have the transition probabilities network for S&P500, corresponding to the minimum and maximum value of the spread. The most noticeable difference between these two networks is regarding the transition probability from the second state to the third state. For the maximum value of \(spread_{t-1}\), the transition probability from the second state to the third state is 0.6. So, when the economy is strong, one might expect to have higher returns, when \(t-1\) was in the second state. However, this scenario shifts when considering the minimum value of \(spread_{t-1}\). The probability of obtaining higher returns, that is, being in state three, becomes almost evenly distributed, regardless of the state in \(t-1\). This indicates the instability of the stock market, when the economy is weaker. Another difference in these networks, is regarding the transition probability from the third state to the first state. For the maximum value of \(spread_{t-1}\), this probability is 0.27 and for the minimum value increases to 0.44. This is also expected, since when the economy is weaker, the probability of having lower returns is greater.
Considering the second equation (Figures \(\ref{fig:fig-djia-max}\) and \(\ref{fig:fig-djia-min}\)), corresponding to the DJIA’s returns, we see a similar behaviour as in S&P500’s networks. The transition probability from the second state to the third state is higher for the maximum value of \(spread_{t-1}\) and the transition probability from the third state to the first state is higher when we consider the minimum value of \(spread_{t-1}\). Although, the difference of this last probability between the minimum and maximum value of \(spread_{t-1}\) is not as big as in S&P500. Overall, the rest of the probabilities structure, remains the same.
Several proposals for including of exogenous variables in MMC models have been presented. The main limitations were associated with the high complexity of the models to be developed and estimated. Additionally, most models considered only categorical exogenous variables, existing a lack of focus on continuous exogenous variables. This work proposes a new approach to include continuous exogenous variables in Ching et al. (2002) model for multivariate Markov chain. This is relevant because it allows studying the effect of previous series and exogenous variables on the transition probabilities. The model is based on Ching et al. (2002) MMC model but considers non-homogeneous Markov chains. Thus, the probabilities that compose the model are dependent on exogenous variables. These probabilities are estimated as a usual non-homogeneous Markov chain through a multinomial logit model. The model parameters are then estimated through MLE, as well as the standard errors. We developed a package with the estimation function of the model proposed. In this, we considered the Augmented Lagrangian optimization method for estimating the parameters through MLE. Additionally, we designed a Monte Carlo simulation study to assess this model’s test power and dimension. The results showed that the model detected a non-homogeneous Markov chain. Moreover, an empirical illustration demonstrated the relevance of this new model by estimating the probability transition matrix for different exogenous variable values. Ignoring the effect of exogenous variables in MMC means that we would not detect the probabilities’ changes according to the covariates’ values. In this setting, one would have a limited view of the studied process. Hence, this approach allows us to understand how a specific variable influences a specific process. The main contributions of this work are the development of a package with functions for multivariate Markov chains, addressing the statistical inference in these models and the inclusion of covariates. The limitations are related to the implementation in R, specifically the optimization algorithm applied is not common for MMC models, in that sense, it would be beneficial to study new approaches to optimizing the maximum likelihood function as further research. Additionally, extending this generalization to the MTD-probit model proposed by Nicolau (2014) would also be relevant, which removes the constraints of the model’s parameters and allows the model to detect negative effects.
Supplementary materials are available in addition to this article. It can be downloaded at RJ-2024-006.zip
march, markovchain, DTMCPack, GenMarkov, msm, maxLik, alabama, nnet, ggplot2, gridExtra, pracma
ChemPhys, DifferentialEquations, Distributions, Econometrics, Finance, MachineLearning, NumericalMathematics, Optimization, Phylogenetics, Spatial, Survival, TeachingStatistics
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Vasconcelos & Damásio, "GenMarkov: Modeling Generalized Multivariate Markov Chains in R", The R Journal, 2025
BibTeX citation
@article{RJ-2024-006, author = {Vasconcelos, Carolina and Damásio, Bruno}, title = {GenMarkov: Modeling Generalized Multivariate Markov Chains in R}, journal = {The R Journal}, year = {2025}, note = {https://doi.org/10.32614/RJ-2024-006}, doi = {10.32614/RJ-2024-006}, volume = {16}, issue = {1}, issn = {2073-4859}, pages = {96-113} }