GenMarkov:  Modeling Generalized Multivariate Markov Chains in R

Carolina Vasconcelos; Bruno Damásio

doi:10.32614/RJ-2024-006

1 Introduction

Multivariate Markov chains (MMC) have a wide range of applications, in various fields. Hence, several studies and generalizations of the MMC models have been made. However, the availability of packages that allow the estimation and application of these models are scarce, and most of these methods use algorithms and software that are not broadly available or can only be applied in particular situations. In the last few years, R software has been gaining importance in the field of statistical computing. This phenomenon might be because it is free and open-source software, which compiles and runs on a wide variety of operating systems. Specifically, in R software, there are some available packages related to Markov chains (MC) and MMC. For example, the march package (Berchtold et al. 2020; Maitre and Emery 2020) allows the computation of various Markovian models for categorical data, including homogeneous Markov chains of any order, MTD models, Hidden Markov models, and Double Chain Markov Models. Ogier Maitre developed this package with contributions from Andre Berchtold, Kevin Emery, Oliver Buschor, and Andre Berchtold maintains it. All the models computed by this package are for univariate categorical data. The markovchain package (Spedicato 2017) contains functions and methods to create and manage discrete-time Markov chains. In addition, it includes functions to perform statistical and probabilistic analysis (analysis of their structural proprieties). Finally, the DTMCPack package (Nicholson 2013) contains a series of functions that aid in both simulating and determining the properties of finite, discrete-time, discrete-state Markov chains. There are two main functions: DTMC and MultDTMC, which produce $n$ iterations of a Markov Chain(s) based on transition probabilities and an initial distribution given by the user, for the univariate and multivariate case, respectively. This last package is the only one available in R for MMC. In general, the work on MMC models is mostly based on improving the estimation methods and/or making the model more parsimonious. In this work, we aim to develop a new generalization that considers exogenous variables. Specifically, the effects of the MMC’s past values and the effects of pre-determined or exogenous covariates are considered in our model by considering a non-homogeneous Markov chain. Additionally, we address statistical inference and implement these methods in an R package. The R package includes three functions: multimtd, multimtd_probit and mmcx. The first two functions estimate the MTD model for multivariate categorical data, with Chings’s specification (Ching et al. 2002) and with the Probit specification (Nicolau 2014), respectively. The last function allows the estimation of our proposed model, the Generalized Multivariate Markov Chain (GMMC) model. The R package, GenMarkov, with these three functions is available in the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=GenMarkov.

2 Multivariate Markov chains

Markov chains can be appropriate for representing dependencies between successive observations of a random variable. However, when the order of the chain or the number of possible values increases, Markov chains have lack parsimony. In this context, Jacobs and Lewis (1978), Pegram (1980) and Logan (1981) proposed several models for HOMC. Notwithstanding these developments, the Mixture Transition Distribution model (Raftery 1985) proved to be more suitable to model HOMC, which overshadowed the previously proposed models. Several relevant extensions of the MTD model emerged: the Multimatrix MTD (Berchtold 1995, 1996), which allowed modeling the MTD by using a different $m \times m$ transition matrix for each lag, the Infinite-Lag MTD model that assumes an infinite lag order ($l = \infty$), which was first considered by Mehran (1989) and later developed by Le et al. (1996) in a more general context. Finally, the MTD with General State Spaces allowed modeling more general processes with an arbitrary space state (Martin and Raftery 1987; Adke and Deshmukh 1988; Wong and Li 2001). Although the MTD model presents a more parsimonious approach to model Markov chains with order higher than one, it has weaknesses. Namely, when considering more than one data sequence, one represents the MMC as a HOMC, by expanding the state-space. This approach could result in a more complex probability transition matrix. Consequently, this can make the estimation unfeasible as the order, states, and the number of data sequences increase. Additionally, the model assumes the same transition matrix for each lag. In this setting, Ching et al. (2002) determined an alternative to handle the unfeasibility of the conventional multivariate Markov chain (MMC) by proposing a model with fewer parameters. The model developed is essentially the same as the MTD. However, it considers a different $m \times m$ transition matrix for each lag and considers more than one data sequence. In the proposed multivariate Markov chain model, Ching et al. (2002) assume the following relationship:

Let $x_t^{(j)}$ be the state vector of the $j$th sequence at time $t$. If the $j$th sequence is in state $l$ at time $t$ then

\[\begin{equation} x_{t+1}^{(j)} = \sum_{k=1}^s \lambda_{jk}P^{(jk)}x_{t}^{(k)}, \text{for } j =1, 2, \dots, s \tag{1} \end{equation}\] where $0 \leq \lambda_{jk} \leq 1$ for $j \leq s, k \leq s$ and $\sum_{k=1}^s \lambda_{jk} =1$ for $j=1, 2, \dots, s$. The $\lambda_{jk}$ can be interpreted as the mixing probability of the $j$th state to the $k$th state.

The state probability distribution of the $k$th sequence at time $(t + 1)$ depends on the weighted average of $P^{(jk)}x_{t}^{(k)}$ . Here $P^{(jk)}$ is a transition probability matrix from the states in the $k$th sequence to the states in the $j$th sequence and $x_t^{(k)}$ is the state probability distribution of the $k$th sequences at time $t$. In matrix form:

\[\begin{equation} \underline{x}_{t+1}^{(j)} \equiv \left[ \begin{array}{c} x_{t+1}^{(1)} \\ \vdots \\ x_{t+1}^{(s)} \end{array} \right ] = \left[ \begin{array}{ccc} \lambda_{11}P^{(11)} & \dots & \lambda_{1s}P^{(1s)}\\ \vdots & \ddots & \vdots\\ \lambda_{s1}P^{(s1)}& \dots & \lambda_{ss}P^{(ss)} \end{array} \right ] \left[ \begin{array}{c} x_{t}^{(1)} \\ \vdots \\ x_{t}^{(s)} \end{array} \right ] \equiv Q \underline{x}_{t} \tag{2} \end{equation}\] where $Q$ is an $ms \times ms$ block matrix ($s \times s$ blocks of $m \times m$ matrices) and $x_t$ is a stacked $ms$ column vector ($s$ vectors, each one with $m$ rows).

The matrices $P^{(jk)}$ can be estimated for each data sequence by counting the transition frequency from the states in the $k$th sequence to those in the $j$th sequence, obtaining the transition frequency matrix for the data sequence. After normalization, the estimates of the transition probability matrices, i.e., $\widehat{P}^{(jk)}$, are obtained. Regarding the $\lambda_{jk}$ coefficients, the estimation method proposed by Ching et al. (2002) involves the following optimization problem:

\[\begin{equation} min_{\lambda} max_{i} \vert [ \sum_{k=1}^m \lambda_{jk} \widehat{P}^{(jk)} \widehat{\boldsymbol{x}}^{(k)} - \widehat{\boldsymbol{x}}^{(j)} ] \vert \tag{3} \end{equation}\]

\[ \text{s.t. } \sum_{k=1}^s \lambda_{jk} \text{ and } \lambda_{jk} \geq 0 \] Besides this, different models have been proposed for multiple categorical data sequences. Kijima et al. (2002) proposed a parsimonious MMC model to simulate correlated credit risks. Siu et al. (2005) proposed an easy to implement model; however, its applicability was limited by the number of parameters involved. Ching et al. (2008) proposed a simplified model based on an assumption proposed in Zhang et al. (2006). Zhu and Ching (2010) proposed a method of estimation based on minimizing the prediction error with equality and inequality restrictions and Nicolau and Riedlinger (2014) proposed a new approach to estimate MMC which avoids imposing restrictions on the parameters, based on non-linear least squares estimation, facilitating the model estimation and the statistical inference. Berchtold (2003) proposed a MTD model for heteroscedastic time series. Lastly, Wang et al. (2014) proposed a new multivariate Markov chain model to reduce the number of parameters. Thus, generally, the models used in the published papers were developed by Ching et al. (2002) or were a consequent generalization of them and addressed the MMC as an end in itself. In Damásio (2013) and Damásio and Nicolau (2014), a different and innovative concept was proposed: the usage of MMC as regressors in a certain model. Hence, given that the MMC Granger causes a specific dependent variable, and taking advantage of the information about the past state interactions between the MMC categories, it was possible to forecast the current dependent variable more accurately. Other relevant contributions are related to the optimization algorithm, as in Lèbre and Bourguignon (2008) and Chen and Lio (2009), and to empirical applications (Ching et al. 2003; Ching and Ng 2006; Damásio 2018; Damásio and Mendonça 2019, 2020). Also, Damásio and Nicolau (2020) proposed a new methodology for detecting and testing the presence multiple structural breaks in a Markov chain occurring at unknown dates. In the vast majority of MMC models’ studies, a positive correlation between the different data sequences is assumed due to the restrictions imposed. This aspect means it is always considered that at moment $t$, an increase in a state probability for a data sequence has an increasing impact on another data sequence, for time $t+1$. Thereupon, if one has a negative correlation between series, the parameter estimates are forced to be zero. The solution to this problem is very straightforward; one can relax the assumptions and not assume the constraints. However, that means the results produced by the model will no longer be probabilities. Raftery and Tavaré (1994) presented an alternative, by dropping the positivity condition and imposing another set of restrictions. Ching et al. (2008) also tackled this issue and proposed a method where one splits the $Q$ matrix into the sum of two other matrices and one represents the positive correlations and another the negative correlations. Also, in Nicolau (2014), a specification completely free from constraints, inspired by the MTD model, was proposed, facilitating the estimation procedure and, at the same time, providing a more accurate specification for $P_j(i_0 | i_1, \dots, i_s)$. The model was:

\[\begin{equation} P_j(i_0 | i_1, \dots, i_s) = P_j^{\Phi}(i_0 | i_1, \dots, i_s) := \\ \frac{\Phi(\eta_{j0} + \eta_{j1}P(i_0|i_1) + \dots + \eta_{js}P(i_0|i_s))}{\sum_{k=1}^m \Phi(\eta_{j0} + \eta_{j1}P(k|i_1) + \dots + \eta_{js}P(k|i_s))} \tag{4} \end{equation}\] where $n_{ji} \in \mathbb{R}(j = 1, \dots, s; i = 1, \dots, m)$ and $\Phi$ is the (cumulative) standard normal distribution function.

This specification is denoted as and MTD-Probit model. The log-likelihood is given by: \[\begin{equation} LL = \sum_{i_1, i_2, \dots, i_{i_s}, i_0} n_{i_1, i_2, \dots, i_{i_s}, i_0} log(P_j^{\Phi}(i_0 | i_1, \dots, i_s) ) \tag{5} \end{equation}\] and the maximum likelihood estimator is defined, as usual, as $\widehat{\eta} = \text{arg max}_{n_{j1}, \dots, n_{js}} LL$. The parameters $P_{jk}(i_0|i_1)$, $k$ =$1, \dots, s$ can be estimated in advance, through the consistent and unbiased estimators proposed by Ching et al. (2002):

\[\begin{equation} \widehat{P}_{jk}(i_0|i_1) = \frac{n_{i_1i_0}}{\sum_{i_0=1}^n n_{i_1 i_0}} \tag{6} \end{equation}\] This specification can be superior to the MTD because the estimation procedure is easier, and the standard numerical optimization routines can be easily applied in the absence of constraints. However, similarly to the standard MTD, the likelihood is not a strictly concave function on the entire parameter state-space, thus the choice of starting values is still important. Additionally, the model describes a broader range of possible dependencies since the parameters are not constrained. Moreover, this proposed model is more accurate than the MTD model. For more details on this, see Nicolau (2014).

Overall, the published work on MMC models was mostly based on improving the estimation methods and/or making the model more parsimonious. In Damásio (2013) and Damásio and Nicolau (2014), a different approach was used, and the work developed focused on the usage of MMC as regressors in a certain model. Notably, it showed that an MMC can improve the forecast of a dependent variable. In a way, it demonstrated that an MMC can be an end in itself, but it can be an instrument to reach an end or a purpose. In this work, the opposite will be developed: instead of considering an MMC as regressors, a model in which a vector with pre-determined exogenous variables is part of $\mathcal{F}_{t-1}$ is proposed.

3 Covariates in Markov chain models

Regarding the inclusion of covariates in Markov chains models, Regier (1968) proposed a two-state Markov chain model, where the transition matrix probabilities were a function of a parameter, $q$, that described the tendency of the subject to move from state to state. Kalbfleisch and Lawless (1985) proposed a panel data analysis method under a continuous-time Markov model that could be generalized to handle covariate analysis and the fitting of certain non-homogeneous models. This work overcame the limitations of Bartholomew (1968), Spilerman and Singer (1976) and Wasserman (1980) methodologies, by developing a new algorithm that provided a very efficient way of obtaining maximum likelihood estimates. Also, Muenz and Rubinstein (1985) developed a Markov model for covariates dependence of binary sequences, where the transitions probabilities were estimated through two logistic regressions that depended on a set of covariates. Essentially, Muenz and Rubinstein (1985) modeled a non-homogeneous Markov chain through logistic regression, considering only two states. Islam et al. (2004) developed an extension of this model considering three states, and Islam and Chowdhury (2006) generalized this approach for HOMC. Additionally, Azzalini (1994) proposed a model to study the influence of time-dependent covariates on the marginal distribution of a binary response in serially correlated binary data, where Markov chains are expressed in terms of transitional probabilities. Jackson (2011) proposed a Markov model for panel data, which allowed for the transitions intensities to vary between individuals or constant time-dependent covariates. Specifically, this work allowed to account for different intensities throughout transitions of states and include individual-specific covariates. The time-inhomogeneos model proposed is restricted to piecewise-constant intensities. The implementation of this work is available in the package msm. More recently, Bolano (2020) proposed an MTD-based approach to handle categorical covariates, that considers each covariate separately and combines the effects of the lags of the MTD and the covariates employing a mixture model. Specifically, the model is given by:

\[\begin{equation} P(X_t = k \mid X_{t-1} = i, C_1 = c_1, \dots, C_l = c_l) \approx \theta_0 a_{ik} + \sum_{h=1}^l \theta_h d_{c_{h}k} \tag{7} \end{equation}\]

where $a_{ik}$ is the transition probability from state $i$ to state $k$, as in a conventional Markov chains and $d_{c_{h}k}$ is the probability of observing the states $k$ given the modality $c_h$ of the covariate $h$. Lastly, $\theta_0, \dots, \theta_l$ are the weights of the explanatory elements of the model.

According to the literature presented, several researchers have proposed methodologies or generalizations to include covariates in Markov chain models. Primarily for social sciences and health applications, where the transition probabilities were generally modeled through logistic regression. However, there has been an increased focus on categorical covariates, opposing continuous covariates and a lack of approaches to multivariate Markov chain models. Thus, with this work, we aim to tackle this research gap.

4 Multivariate Markov chains with covariates

4.1 Theoretical model

In this work, a new generalization of Ching et al. (2002) MMC model is presented: the GMMC model, that is, we will consider exogeneous or pre-determined covariates in the $\sigma$ - algebra generated by the available information until $t-1$ ($\mathcal{F}_{t-1}$). These variables can be deterministic or stochastic and do not necessarily need to be reported at time $t$. Broadly, the model is given by:

\[\begin{equation} P(S_{jt} = k | \mathcal{ F}_{t-1} ) = P(S_{jt} = k | S_{1t-1} = i_1, S_{2t-1} = i_2, \dots, S_{st-1} = i_s, \boldsymbol{x}_t) \tag{8} \end{equation}\] We can specify this model as proposed by Ching et al. (2002) with Raftery’s notation:

\[\begin{multline} P(S_{jt} = i_0 | S_{1t-1} = i_1,\dots, S_{st-1} = i_s, \boldsymbol{x}_t) \equiv \\ \lambda_{j1}P(S_{jt} = i_0 | S_{1t-1} = i_1,\boldsymbol{x}_t) + \dots + \lambda_{js}P(S_{jt} = i_0 | S_{st-1} = i_s, \boldsymbol{x}_t) \tag{9} \end{multline}\] subject to the usual constraints.

4.2 Estimation and inference

This proposed model is estimated through MLE, similar to the standard MTD model. The log-likelihood is given by:

\[\begin{equation} LL = \sum_{t = 1}^n log P(S_{jt} = i_0 | S_{1t-1} = i_1, \dots, S_{st-1} = i_s, \boldsymbol{x}_t) \tag{10} \end{equation}\]

Additionally, the probabilities can be estimated through an multinomial logit model. The proof for consistency and asymptotic distribution is available in the Supplementary Material section.

4.3 Monte Carlo simulation study

A Monte Carlo simulation study was designed to evaluate the dimension and power of the test parameters of the proposed model. The R statistical environment was used for all computations. This simulation study was comprised of two parts.

Part I: Detect a non-homogeneous Markov chain

First, we considered two sequences with two and three states. The main goal was to assess if the model detected the presence of a non-homogeneous Markov chain correctly and if the estimate of the parameter would correspond to the expected. So, given two sequences, one generated through a non-homogeneous Markov chain and the other generated through a homogeneous Markov chain, it would be expected that the parameter associated with the transition probabilities of the first sequence would be one and the parameter associated with the transition probabilities of the second sequence would be zero. With this in mind, the transitions probabilities of the first sequence were estimated through a logistic regression, where parameters of this regression were randomly generated in R, and the second sequence was generated through a first-order Markov chain. Hence, for both states cases considered, it was expected that the estimated regression would be:

\[\begin{multline} P(S_{1t} = i_0 | S_{1t-1} = i_1, S_{2t-1} = i_2, \boldsymbol{x}_{t-1}) = \\ 1 \times P(S_{1t} = i_0 | S_{1t-1} = i_1,\boldsymbol{x}_{t-1}) + 0 \times P(S_{1t} = i_0 | S_{2t-1} = i_2, \boldsymbol{x}_{t-1}) \tag{11} \end{multline}\]

To assess the test power and dimension, we used the Wald test with the following hypothesis:

Table 1: Power and dimension of test assessment
	Hypothesis	Test
Power	$H_0: \lambda_{11} = 0$	$\frac{\widehat{\lambda}_{11}^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}$
	$H_0: \lambda_{12} = 1$	$\frac{(\widehat{\lambda}_{12}-1)^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}$
Dimension	$H_0: \lambda_{11} = 1$	$\frac{(\widehat{\lambda}_{11}-1)^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}$
	$H_0: \lambda_{12} = 0$	$\frac{\widehat{\lambda}_{12}^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}$

The simulation procedure was performed as follows:

Generate the values of the coefficients for the probability transition matrix of series $S_{1t}$ randomly;
Generate the probability transition matrix of series $S_{2t}$ randomly;
Set the initial value of $S_{2t}$ to 1 and simulate the following from the defined probability transition matrix;
In each iteration (of 1000 repetitions),
- Generate $X_t \sim N(2,25)$;
- Generate the time-varying probabilities of series $S_{1t}$ through the values of the fixed coefficients and the lagged variable $x_t$;
- Set the initial values of the series $S_{1t}$ as 1;
- For each period $t$, simulate the next state of $S_{1t}$ from the probabilities simulated for that moment;
- Estimate the model through the function mmcx;
- Calculate the Wald test and add to the counter if it is rejected.

Simulation study results for two-states, displaying the proportion of rejections of the null hypothesis for two parameter values. Dimension of test remains stable regardless sample size. Power of test increases with sample size. The proposed model detects the presence of non-homogenenous Markov Chain.

Figure 1: Simulation study results for two-states, displaying the proportion of rejections of the null hypothesis for two parameter values. Dimension of test remains stable regardless sample size. Power of test increases with sample size. The proposed model detects the presence of non-homogenenous Markov Chain.

Considering two states, the test dimension was at 5.7% with a sample size of 100 observations, sightly increased with 500 observations, and returned to the expected values in 1000 and 5000 observations. For a sample size of 100, 500, and 1000 observations, we have low test power. So, when considering two states, the sample must have at least 5000 observations, or, if that is not possible, consider a higher significance level when testing for individual significance.

Simulation study results for three-states, displaying the proportion of rejections of the null hypothesis for two parameter values. Dimension of test decreases as sample size increases. Power of test is stable regardless of sample size. The proposed model detects the presence of non-homogenenous Markov Chain.

Figure 2: Simulation study results for three-states, displaying the proportion of rejections of the null hypothesis for two parameter values. Dimension of test decreases as sample size increases. Power of test is stable regardless of sample size. The proposed model detects the presence of non-homogenenous Markov Chain.

Considering three states, the test dimension was 9.7% for a sample size of 100 observations, 0.2% for a sample size of 500 observations, and 0.3% for a sample size of 1000. Regarding the test power, we see similar behavior, for a sample of 100 observations, the test power was 90.5%, and from a sample of 500 observations, we reach a test power of 100%. Thus, when considering three states, one may consider a sample of 500 observations without compromising the test power and dimension.

Part II: Detecting Parameters Assigned Values

Secondly, we performed a simulation study where we considered two non-homogeneous Markov chain with two states. Here, the main goal was to assess if the model correctly detected the parameters assigned. So, in this case, we started by generating the terms of the model proposed. These terms were estimated through logistic regression, and the parameters of this regression were randomly generated in R. Similarly to Part I, we considered a Wald test to assess the power and dimension of the test. The simulation procedure was performed as follows:

Generate the values of the coefficients to calculate the probability transition matrices randomly;
In each iteration (of 1000 repetitions),
- Generate $\{x_t\} \sim N(2,25)$;
- Generate the probabilities $P \left(S_{jt}|S_{st-1}, x_{t-1} \right)$, with $j=1,2$ and $s=1,2$.
- Set the initial values of the series $S_{1t}$ and $S_{2t}$ as 1;
- For each period $t$, calculate the probabilities $P \left(S_{1t}|S_{1t-1}, S_{2t-1}, x_{t-1} \right)$ and $P \left( S_{2t}|S_{1t-1}, S_{2t-1}, x_{t-1} \right)$ through the assigned values of the $\lambda$’s. Considering the calculated probabilities, simulate the next state for each series, $S_{1t}$ and $S_{2t}$.
- Estimate the model through the function mmcx;
- Calculate the Wald test and add to the counter if it is rejected.

The probabilities $P\left(S_{1t}|S_{1t-1}, x_{t-1} \right)$ and $P\left(S_{1t}|S_{2t-1}, x_{t-1}\right)$ presented some differences regarding its values’ distributions. Specifically, $P\left(S_{1t}|S_{1t-1}, x_{t-1} \right)$ had more extreme probabilities values, with the minimum value being close to 0 and the maximum value being close to 1. And, the probabilities $P\left(S_{1t}|S_{2t-1}, x_{t-1} \right)$ had more moderate values, with the minimum value being, on average, 0.3 and the maximum value, 0.7. When the probabilities have values close to 1, one says that the states/regimes are persistent. We calculated the power and dimension of test for each value of $\lambda$ when the estimated probabilities are moderate and when they are extreme. Hence, considering equation 1:

\[\begin{multline} P\left(S_{1t} = i_0 | S_{1t-1} = i_1,\dots, S_{2t-1} = i_2, \boldsymbol{x}_{t-1} \right) = \\ \lambda_{11}P\left(S_{1t} = i_0 | S_{1t-1} = i_1,\boldsymbol{x}_{t-1}\right) + \lambda_{12}P\left(S_{1t} = i_0 | S_{2t-1} = i_s, \boldsymbol{x}_{t-1} \right) \tag{12} \end{multline}\]

The parameter $\lambda_{11}$ will be associated with more extreme probabilities and $\lambda_{12}$ will be associated with more moderate probabilities.

Simulation study results for persistent states on low values of the parameters (case 1), displaying the proportion of rejections of the null hypothesis for four parameter values. Dimension decreases as sample size increases. Power of test increases with sample size. The proposed model has low power of test when low parameter values are associated with persistent states.

Figure 3: Simulation study results for persistent states on low values of the parameters (case 1), displaying the proportion of rejections of the null hypothesis for four parameter values. Dimension decreases as sample size increases. Power of test increases with sample size. The proposed model has low power of test when low parameter values are associated with persistent states.

Simulation study results for persistent states on high values of the parameters (case 2), displaying the proportion of rejections of the null hypothesis for four parameter values. Dimension and power of test increase as sample size increases. The results point towards a low test power in this setting.

Figure 4: Simulation study results for persistent states on high values of the parameters (case 2), displaying the proportion of rejections of the null hypothesis for four parameter values. Dimension and power of test increase as sample size increases. The results point towards a low test power in this setting.

When the states are persistent and the parameter’s value is low (i.e., 0.2 and 0.4), we have low test power. By increasing this value, the power of test increases as well. When the states are not persistent, we do not have a clear pattern regarding the power of test, for a value of the parameter of 0.2, the power of test is still low (although not as low as the first scenario), increases when we have a value of 0.4, decreases when the value is 0.6 and increases again when the value is 0.8. Overall, the estimated standard errors seem high, leading to low test power. Regarding the test dimension, when we have a higher weight associated with the non-persistent states, the test dimension converges to 0. However, when this weight is associated with the persistent states, the test dimension increases with the sample size, reaching a value of 10% in some cases. Hence, one must use a 10% significance level to perform statistical inference on the parameters in this situation.

4.4 Software implementation

Regarding the software implementation for each function, for the multimtd function the estimation method was presented in Berchtold (2001) applied to the multivariate case. For multimtd_probit, a package for numerical maximization of the log-likelihood, maxLik (Henningsen and Toomet 2011), was used. This package performs Maximum Likelihood estimation through different optimization methods that the user can choose. The optimization methods available are Newton-Raphson, Broyden - Fletcher - Goldfarb - Shanno, BFGS al- algorithm, Berndt - Hall - Hall - Hausman, Simulated ANNealing, Conjugate Gradients, and Nelder-Mead. Finally, for the mmcx function, a different approach was used. Unlike the MTD- Probit, the model proposed has equality and inequality restrictions in the parameters. The maxLik (Henningsen and Toomet 2011) package only allows one type of restriction for each Maximum Likelihood estimation, so it was not possible to use this package to estimate the proposed model with exogenous variables. Hence, the algorithm used was the Augmented Lagrangian method, available in the alabama (Varadhan 2015) package through the function auglag. This estimation method for the proposed model is not very common, however, it has been applied to Markov chain models (Rajarshi 2013). The GMMC model’s probabilities were estimated through a Multinomial Logit using rmultinom of the nnet package (Venables and Ripley 2002).

Additionally, the hessian matrices were also computed, which allowed performing statistical inference. The maxLik and auglag compute the Hessian matrices with the estimates. For the function multimtd, since the optimization procedure of Berchtold (2001) was used, the hessian was computed through the second partial derivatives. The function multi.mtd requires the following elements:

y, a matrix of the categorical data sequences.
deltaStop, the delta below which the optimization phases of the parameters stop.
is_constrained, flag indicating whether the function will consider the usual set of constraints (usual set: , new set of constraints: ).
delta, the amount of change to increase/decrease in the parameters for each iteration of the optimization algorithm.

The last three arguments concern the optimization procedure. For more details see Berchtold (2001). Considering two vectors of two categorical data sequences, s1 and s2, to estimate the model and obtain the results:

multi.mtd(y=cbind(s1,s2), deltaStop=0.0001, is_constrained=TRUE, delta=0.1)

The function multi.mtd_probit requires the following arguments:

y, a matrix of the categorical data sequences.
initial, a vector of the initial values of the parameters.
nummethod, the numerical maximization method, currently either “NR” (for Newton-Raphson), “BFGS” (for Broyden-Fletcher-Goldfarb-Shanno), “BFGSR” (for the BFGS algorithm implemented in R), “BHHH” (for Berndt-Hall-Hall-Hausman), “SANN” (for Simulated ANNealing), “CG” (for Conjugate Gradients), or “NM” (for Nelder-Mead). Lower-case letters (such as “nr” for Newton-Raphson) are allowed. The default method is “BFGS”. For more details see maxLik (Henningsen and Toomet 2011) package.

Considering two vectors of two categorical data sequences, s1 and s2 again, to estimate the model an obtain the results with BFGS maximization method:

multi.mtd_probit(y = cbind(s1,s2), initial=c(1,1,1), nummethod='bfgs')

Finally, the function mmcx requires the following elements:

y, a matrix of categorical data sequences.
x, a matrix of covariates (exogeneous variables).
initial, a vector of the initial values of the parameters.

Considering two vectors of two categorical data sequences, s1 and s2, and a vector of an exogeneous variables, x, to estimate the model and obtain the results:

mmcx(y = cbind(s1,s2), x = cbind(x), initial=c(1,1))

These functions return a list with the parameter estimates, standard errors, z-statistics, p- values, and the log-likelihood function value for each equation.

The package offers an additional function that allows to obtain the transition probability matrices of mmcx considering a specific value of x defined by the user. The function is MMC_tpm and requires the following elements:

s, a matrix of categorical data sequences.
x, a matrix of covariates (exogeneous variables).
value, a single value of x, to condition the probability transition matrices.
result, a list returned by the function mmcx containing the model’s estimates.

Considering two vectors of two categorical data sequences, s1 and s2, a vector of an exogeneous variables, x and res the list returned by the function mmcx, to obtain the transition probability matrices:

MMC_tpm(s = cbind(s1,s2), x = cbind(x), value = max(x), result = res)

The function returns an array containing the probability transition matrices, conditioned on a specific value of x, for each equation.

5 Illustration

Markov chain models are used in interdisciplinary areas, such as economics, business, biology, and engineering, with applications to predict long-term behavior from traffic flow to stock market movements, among others. Modeling and predicting stock markets returns is particularly relevant for investors and policy makers. Since the stock market is a volatile environment, and the returns are difficult to predict, estimating the set of probabilities that describe these movements, might provide relevant input. Additionally, incorporating the effect of key macroeconomic variables could provide a more accurate picture of this specific environment.

The following empirical illustration aims to model stock returns of two indexes as a function of the interest rate spread, specifically the 10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity.

The interest rate spread is a key macroeconomic variable and provides valuable information regarding the economy state. Specifically, it has been used to forecast recessions as in Estrella and Mishkin (1996), Dombrosky and Haubrich (1996), Chauvet and Senyuz (2016), Tian and Shen (2019) and McMillan (2021). Generically, short-term yields are lower than long-term yields when the economy is in expansion. On the other hand, short-term yields are higher than long-term yields when the economy is in recession. The difference between these yields (or, more specifically, the yield curve’s slope) can be used to forecast the state of the economy. Hence, this indicator might provide relevant input for investors.

We considered the 5-week-day daily stock returns ($r_t=100 \times \log(P_t/P_{t-1})$, where $P_t$ is the adjusted close price) of two indexes, S&P500 and DJIA, from November $11^{th}$ 2011 to September $1^{st}$ 2021 (2581 observations). Additionally, we considered the interest rate spread ($spread_{t}$), the 10-Year Treasury Constant Maturity Minus 3-Month Treasury Constant Maturity. The data was retrieved from FRED. Below, we have the descriptive statistics of these variables.

Table 2: Summary statistics of $stockreturns$ dataset
Variable	Minimum	1$^{st}$ Quantile	Median	Mean	3$^{rd}$ Quantile	Maximum
$spread_{t}$	-0.52	0.92	1.54	1.454	2.03	2.97
$r_{t;SP500}$	-12.765	-0.32	0.07	0.054	0.518	8.968
$r_{t;DJIA}$	-13.842	-0.327	0.071	0.046	0.508	10.764

Moreover, to apply the model proposed, it is necessary to have a categorical time series, thus we applied the following procedure:

\[ S_{st}= \begin{cases} 1, r_t \leq \widehat{q}_{s;0.25}\\ 2, \widehat{q}_{s;0.25} < r_t < \widehat{q}_{s;0.75} \\ 3, r_t \geq \widehat{q}_{s;0.75}\\ \end{cases} \]

where $\widehat{q}_{s;\alpha}$ is the estimated quantile of order $\alpha$ of the marginal distribution of $r_t$. Considering this illustration and the model proposed, we will have two equations:

\[\begin{multline} P(S_{sp500,t} | S_{sp500, t-1}, S_{djia, t-1}, spread_{t-1}) = \\ \lambda_{11} P(S_{sp500,t} | S_{sp500, t-1}, spread_{t-1}) + \lambda_{12} P(S_{sp500,t} | S_{djia, t-1}, spread_{t-1}) \tag{13} \end{multline}\]

\[\begin{multline} P(S_{djia,t} | S_{sp500, t-1}, S_{djia, t-1}, spread_{t-1}) = \\ \lambda_{21} P(S_{djia,t} | S_{sp500, t-1}, spread_{t-1}) + \lambda_{22} P(S_{djia,t} | S_{djia, t-1}, spread_{t-1}) \tag{14} \end{multline}\]

In Figures 5 to 8 generate through ggplot2 (Wickham 2016) and gridExtra (Auguie 2017), we have the smoothed conditional probabilities of both series, depending on $spread_{t-1}$. The number of observations is high, and the probabilities varied abruptly in a small time frame, making the plots hard to read. To simplify, a moving average model (from pracma (Borchers 2022)) of order 5, due to the frequency of the data, was adjusted to these probabilities to illustrate how they evolve throughout time. These plots represent the probabilities associated with the parameters of the general model proposed, showcasing how these vary throughout time and the main of advantage of this generalization. Instead of having fixed matrices of transition probabilities, we allow for these to vary throughout time, depending on the values of $spread_{t-1}$. Specifically, Figures 5 and 6 correspond to the non-homogeneous Markov chain to build the SP&500’s equation and Figures 7 and Figures 8 correspond to the non-homogeneous Markov chain to build DJIA’s equation. We see a similar behavior within each series regardless of whether it depends on the previous states of $S_{1t}$ or $S_{2t}$. Additionally, the scales of the graphs are small, indicating that these probabilities vary around the same set of values.

$Estimated conditional probabilities of series 1 (SP500) depending on $spread_{t-1}$ and on series 1 (SP500) previous state: $P(S_{sp500,t} | S_{sp500, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.$

Figure 5: Estimated conditional probabilities of series 1 (SP500) depending on $spread_{t-1}$ and on series 1 (SP500) previous state: $P(S_{sp500,t} | S_{sp500, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.

$Estimated conditional probabilities of series 1 (SP500) depending on $spread_{t-1}$ and on series 2 (DJIA) previous state: $P(S_{sp500,t} | S_{djia, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.$

Figure 6: Estimated conditional probabilities of series 1 (SP500) depending on $spread_{t-1}$ and on series 2 (DJIA) previous state: $P(S_{sp500,t} | S_{djia, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.

$Estimated conditional probabilities of series 2 (DJIA) depending on $spread_{t-1}$ and on series 1 (SP500) previous state: $P(S_{djia,t} | S_{sp500, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.$

Figure 7: Estimated conditional probabilities of series 2 (DJIA) depending on $spread_{t-1}$ and on series 1 (SP500) previous state: $P(S_{djia,t} | S_{sp500, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.

$Estimated conditional probabilities of series 2 (DJIA) depending on $spread_{t-1}$ and on series 2 (DJIA) previous state: $P(S_{djia,t} | S_{djia, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.$

Figure 8: Estimated conditional probabilities of series 2 (DJIA) depending on $spread_{t-1}$ and on series 2 (DJIA) previous state: $P(S_{djia,t} | S_{djia, t-1}, spread_{t-1})$. This figure shows the estimated non-homogeneous Markov chain from which the realized probabilites will be extracted to maximize the log-likelihood function.

The model can be estimated through the mmcx function:

attach(stockreturns)
res <- mmcx(cbind(sp500, djia), spread_1, initial=c(1,1))

--------------------------------------------
Equation 1 
  Estimate Std. Error t value Pr(>|t|)    
1 0.685660   0.171346   4.002    0.000 ***
2 0.314340   0.171346   1.835    0.067 *  

Log-Likelihood: -2636.355 
--------------------------------------------
--------------------------------------------
Equation 2 
  Estimate Std. Error t value Pr(>|t|)    
1 0.629993   0.176383   3.572    0.000 ***
2 0.370007   0.176383   2.098    0.036 ** 

Log-Likelihood: -2636.622 
--------------------------------------------

Considering the first equation, the effect of the probabilities depending on S&P500’s previous state and the interest rate spread has a higher weight on the overall probability. Also, this estimate is highly significant, presenting a $p$-value close to zero. The effect of DJIA’s previous state in S&P500 is lower but it is also significant for a 10% significance level. In the second equation, the effect of S&P500’s previous state is higher than DJIA’s and both estimates are highly significant.

One of the advantages of this approach is the possibility to assess the transition probabilities for specific values of $x_t$, in this case, the interest rate spread. For both series, we calculated the transition probabilities for this variable’s minimum and maximum value in the sample, which are -0.52 and 2.97, respectively. To obtain the probability transition matrices for these two cases, the code is the following:

tpm_max <- MMC_tpm(cbind(sp500, djia), spread_1, 
                   value = max(spread_1), result = res)

tpm_min <- MMC_tpm(cbind(sp500, djia), spread_1, 
                   value = min(spread_1), result = res)

library(markovchain)
plot(new('markovchain', transitionMatrix = tpm_max[,,1])) # Generate figure 9
plot(new('markovchain', transitionMatrix = tpm_min[,,1])) # Generate figure 10
plot(new('markovchain', transitionMatrix = tpm_max[,,2])) # Generate figure 11
plot(new('markovchain', transitionMatrix = tpm_min[,,2])) # Generate figure 12

In Figures 10 and 9, we have the transition probabilities network for S&P500, corresponding to the minimum and maximum value of the spread. The most noticeable difference between these two networks is regarding the transition probability from the second state to the third state. For the maximum value of $spread_{t-1}$, the transition probability from the second state to the third state is 0.6. So, when the economy is strong, one might expect to have higher returns, when $t-1$ was in the second state. However, this scenario shifts when considering the minimum value of $spread_{t-1}$. The probability of obtaining higher returns, that is, being in state three, becomes almost evenly distributed, regardless of the state in $t-1$. This indicates the instability of the stock market, when the economy is weaker. Another difference in these networks, is regarding the transition probability from the third state to the first state. For the maximum value of $spread_{t-1}$, this probability is 0.27 and for the minimum value increases to 0.44. This is also expected, since when the economy is weaker, the probability of having lower returns is greater.

$Graphical representation of the transition probability matrix of Series 1: SP500 for the maximum value of spread$_{t-1}$. The highest probability of 0.6 refers to the transition from state 2 to state 3.$

Figure 9: Graphical representation of the transition probability matrix of Series 1: SP500 for the maximum value of spread$_{t-1}$. The highest probability of 0.6 refers to the transition from state 2 to state 3.

$Graphical representation of the transition probability matrix of Series 1: SP500 for the minimum value of spread$_{t-1}$. The highest probability of 0.56 refers to the transition from state 2 to state 2.$

Figure 10: Graphical representation of the transition probability matrix of Series 1: SP500 for the minimum value of spread$_{t-1}$. The highest probability of 0.56 refers to the transition from state 2 to state 2.

Considering the second equation (Figures $\ref{fig:fig-djia-max}$ and $\ref{fig:fig-djia-min}$), corresponding to the DJIA’s returns, we see a similar behaviour as in S&P500’s networks. The transition probability from the second state to the third state is higher for the maximum value of $spread_{t-1}$ and the transition probability from the third state to the first state is higher when we consider the minimum value of $spread_{t-1}$. Although, the difference of this last probability between the minimum and maximum value of $spread_{t-1}$ is not as big as in S&P500. Overall, the rest of the probabilities structure, remains the same.

$Graphical representation of the transition probability matrix of Series 2: DJIA for the maximum value of spread$_{t-1}$. The probability of 0.58 refers to the transition from state 2 to state 3.$

Figure 11: Graphical representation of the transition probability matrix of Series 2: DJIA for the maximum value of spread$_{t-1}$. The probability of 0.58 refers to the transition from state 2 to state 3.

$Graphical representation of the transition probability matrix of Series 2: DJIA for the minimum value of spread$_{t-1}$. The highest probability of 0.51 refers to the transition from state 2 to state 2.$

Figure 12: Graphical representation of the transition probability matrix of Series 2: DJIA for the minimum value of spread$_{t-1}$. The highest probability of 0.51 refers to the transition from state 2 to state 2.

6 Conclusions, limitations and further research

Several proposals for including of exogenous variables in MMC models have been presented. The main limitations were associated with the high complexity of the models to be developed and estimated. Additionally, most models considered only categorical exogenous variables, existing a lack of focus on continuous exogenous variables. This work proposes a new approach to include continuous exogenous variables in Ching et al. (2002) model for multivariate Markov chain. This is relevant because it allows studying the effect of previous series and exogenous variables on the transition probabilities. The model is based on Ching et al. (2002) MMC model but considers non-homogeneous Markov chains. Thus, the probabilities that compose the model are dependent on exogenous variables. These probabilities are estimated as a usual non-homogeneous Markov chain through a multinomial logit model. The model parameters are then estimated through MLE, as well as the standard errors. We developed a package with the estimation function of the model proposed. In this, we considered the Augmented Lagrangian optimization method for estimating the parameters through MLE. Additionally, we designed a Monte Carlo simulation study to assess this model’s test power and dimension. The results showed that the model detected a non-homogeneous Markov chain. Moreover, an empirical illustration demonstrated the relevance of this new model by estimating the probability transition matrix for different exogenous variable values. Ignoring the effect of exogenous variables in MMC means that we would not detect the probabilities’ changes according to the covariates’ values. In this setting, one would have a limited view of the studied process. Hence, this approach allows us to understand how a specific variable influences a specific process. The main contributions of this work are the development of a package with functions for multivariate Markov chains, addressing the statistical inference in these models and the inclusion of covariates. The limitations are related to the implementation in R, specifically the optimization algorithm applied is not common for MMC models, in that sense, it would be beneficial to study new approaches to optimizing the maximum likelihood function as further research. Additionally, extending this generalization to the MTD-probit model proposed by Nicolau (2014) would also be relevant, which removes the constraints of the model’s parameters and allows the model to detect negative effects.

6.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2024-006.zip

6.2 CRAN packages used

march, markovchain, DTMCPack, GenMarkov, msm, maxLik, alabama, nnet, ggplot2, gridExtra, pracma

6.3 CRAN Task Views implied by cited packages

ChemPhys, DifferentialEquations, Distributions, Econometrics, Finance, MachineLearning, NetworkAnalysis, NumericalMathematics, Optimization, Phylogenetics, Spatial, Survival, TeachingStatistics

S. R. Adke and S. R. Deshmukh. Limit Distribution of a High Order Markov Chain. Journal of the Royal Statistical Society. Series B (Methodological), 50(1): 105–108, 1988. URL https://www.jstor.org/stable/2345812.

B. Auguie. gridExtra: Miscellaneous functions for "grid" graphics. 2017. URL https://CRAN.R-project.org/package=gridExtra. R package version 2.3.

A. Azzalini. Logistic regression for autocorrelated data with application to repeated measures. Biometrika, 81(4): 767–775, 1994. DOI 10.1093/biomet/81.4.767.

J. Bartholomew. Stochastic Models for Social Processes. The Australian and New Zealand Journal of Sociology, 4(2): 171–172, 1968. DOI https://doi.org/10.1177/144078336800400215.

A. Berchtold. Autoregressive Modelling of Markov Chains. Proc. 10th International Workshop on Statistical Modelling, 104: 19–26, 1995. DOI 10.1007/978-1-4612-0789-4_3.

A. Berchtold. Estimation in the mixture transition distribution model. Journal of Time Series Analysis, 22(4): 379–397, 2001. DOI https://doi.org/10.1111/1467-9892.00231.

A. Berchtold. Mixture transition distribution (MTD) modeling of heteroscedastic time series. Computational Statistics and Data Analysis, 41(3-4): 399–411, 2003. DOI 10.1016/S0167-9473(02)00191-3.

A. Berchtold. Modélisation autorégressive des chaines de Markov : utilisation d’une matrice différente pour chaque retard. Revue de Statistique Appliquée, 44(3): 5–25, 1996. URL http://www.numdam.org/item/RSA_1996__44_3_5_0/.

A. Berchtold, O. Maitre and K. Emery. Optimization of the mixture transition distribution model using the march package for R. Symmetry, 12(12): 1–14, 2020. DOI 10.3390/sym12122031.

D. Bolano. Handling covariates in markovian models with a mixture transition distribution based approach. Symmetry, 12(4): 2020. DOI 10.3390/SYM12040558.

H. W. Borchers. Pracma: Practical numerical math functions. 2022. URL https://CRAN.R-project.org/package=pracma. R package version 2.4.2.

M. Chauvet and Z. Senyuz. A dynamic factor model of the yield curve components as a predictor of the economy. International Journal of Forecasting, 32(2): 324–343, 2016. DOI https://doi.org/10.1016/j.ijforecast.2015.05.007.

D. G. Chen and Y. L. Lio. A Novel Estimation Approach for Mixture Transition Distribution Model in High-Order Markov Chains. Communications in Statistics - Simulation and Computation, 38(5): 990–1003, 2009. DOI 10.1080/03610910802715009.

W. K. Ching, E. S. Fung and M. K. Ng. A higher-order markov model for the newsboy’s problem. The Journal of the Operational Research Society, 54(3): 291–298, 2003.

W. K. Ching, E. S. Fung and M. K. Ng. A multivariate markov chain model for categorical data sequences and its applications in demand predictions. IMA Journal of Management Mathematics, 13(3): 187–199, 2002. DOI 10.1093/imaman/13.3.187.

W. K. Ching and M. K. Ng. Markov chains: Models, algorithms and applications. Springer, 2006. DOI 10.1007/0-387-29337-X.

W. K. Ching, M. K. Ng and E. S. Fung. Higher-order multivariate Markov chains and their applications. Linear Algebra and its Applications, 428(2-3): 492–507, 2008. DOI 10.1016/j.laa.2007.05.021.

B. Damásio. Essays on Econometrics: Multivariate Markov Chains. 2018. URL https://www.repository.utl.pt/bitstream/10400.5/18128/1/TD-BD-2019.pdf.

B. Damásio. Multivariate Markov Chains - Estimation, Inference and Forecast. A New Approach: What If We Use Them As Stochastic Covariates? 2013. URL http://hdl.handle.net/10400.5/6397.

B. Damásio and S. Mendonça. Leader-follower dynamics in real historical time: A markovian test of non-linear causality between sail and steam (co-)development, mimeo. 2020.

B. Damásio and S. Mendonça. Modelling insurgent-incumbent dynamics: Vector autoregressions, multivariate Markov chains, and the nature of technological competition. Applied Economics Letters, 26(10): 843–849, 2019. DOI 10.1080/13504851.2018.1502863.

B. Damásio and J. Nicolau. Combining a regression model with a multivariate Markov chain in a forecasting problem. Statistics & Probability Letters, 90: 108–113, 2014. DOI https://doi.org/10.1016/j.spl.2014.03.026.

B. Damásio and J. Nicolau. Time inhomogeneous multivariate Markov chains : detecting and testing multiple structural breaks occurring at unknown dates. REM working papers 0136–2020. Instituto Superior de Economia e Gestão. 2020. URL http://hdl.handle.net/10400.5/20164.

A. M. Dombrosky and J. Haubrich. Predicting real growth using the yield curve. Economic Review, I(Q): 26–35, 1996. URL https://EconPapers.repec.org/RePEc:fip:fedcer:y:1996:i:qi:p:26-35.

A. Estrella and F. S. Mishkin. The yield curve as a predictor of U.S. recessions. Current Issues in Economics and Finance, 2(Jun): 1996. URL https://www.newyorkfed.org/research/current_issues/ci2-7.html.

A. Henningsen and O. Toomet. maxLik: A package for maximum likelihood estimation in R. Computational Statistics, 26(3): 443–458, 2011. URL http://dx.doi.org/10.1007/s00180-010-0217-1.

M. A. Islam, S. Arabia and R. I. Chowdhury. A Three State Markov Model for Analyzing Covariate Dependence. International Journal of Statistical Sciences, 3(i): 241–249, 2004. URL http://www.ru.ac.bd/stat/wp-content/uploads/sites/25/2019/01/P21.V3s.pdf.

M. A. Islam and R. I. Chowdhury. A higher order Markov model for analyzing covariate dependence. Applied Mathematical Modelling, 30(6): 477–488, 2006. DOI 10.1016/j.apm.2005.05.006.

C. Jackson. Multi-state models for panel data: The msm package for r. Journal of statistical software, 38: 1–28, 2011. DOI 10.18637/jss.v038.i0810.18637/jss.v038.i08.

P. A. Jacobs and A. W. Lewis. Discrete Time Series Generated by Mixtures II : Asymptotic Properties. Journal of the Royal Statistical Society: Series B (Methodological), 40(2): 222–228, 1978. URL https://www.jstor.org/stable/2984759.

J. D. Kalbfleisch and J. F. Lawless. The analysis of panel data under a Markov assumption. Journal of the American Statistical Association, 80(392): 863–871, 1985. DOI 10.1080/01621459.1985.10478195.

M. Kijima, K. Komoribayashi and E. Suzuki. A multivariate Markov model for simulating correlated defaults. Journal of Risk, 4: 2002. DOI 10.21314/JOR.2002.066.

N. D. Le, R. D. Martin and A. Raftery. Modeling Flat Stretches, Brusts, and Outliers in Time Series Using Mixture Transition Distribution Models. Journal of the American Statistical Association, 91(436): 1504–1515, 1996. DOI 10.1111/j.2517-6161.1985.tb01383.x.

S. Lèbre and P. Y. Bourguignon. An EM algorithm for estimation in the mixture transition distribution model. Journal of Statistical Computation and Simulation, 78(8): 713–729, 2008. DOI 10.1080/00949650701266666.

J. Logan. A structural model of the higher‐order Markov process incorporating reversion effects. The Journal of Mathematical Sociology, 8(1): 75–89, 1981. DOI 10.1080/0022250X.1981.9989916.

O. Maitre and K. Emery. March: Markov chains. 2020. URL https://CRAN.R-project.org/package=march. R package version 3.3.2.

R. D. Martin and A. Raftery. Non-Gaussian State-Space Modeling of Nonstationary Time Series: Comment: Robustness, Computation, and Non-Euclidean Models. Journal of the American Statistical Association, 82(400): 1044–1050, 1987. DOI 10.2307/2289377.

D. G. McMillan. Predicting GDP growth with stock and bond markets: Do they contain different information? International Journal of Finance & Economics, 26(3): 3651–3675, 2021. DOI https://doi.org/10.1002/ijfe.1980.

F. Mehran. Analysis of Discrete Longitudinal Data: Infinite-Lag Markov Models. In Statistical data analysis and inference, pages. 533–541 1989. Amsterdam: North-Holland. ISBN 978-0-444-88029-1. DOI https://doi.org/10.1016/B978-0-444-88029-1.50053-8.

L. R. Muenz and L. V. Rubinstein. Markov Models for Covariate Dependence of Binary Sequences . Biometrics, 41(1): 91–101, 1985. URL http://www.jstor.org/stable/2530646.

W. Nicholson. DTMCPack: Suite of functions related to discrete-time discrete-state markov chains. 2013. URL https://CRAN.R-project.org/package=DTMCPack. R package version 0.1-2.

J. Nicolau. A new model for multivariate markov chains. Scandinavian Journal of Statistics, 41(4): 1124–1135, 2014. DOI 10.1111/sjos.12087.

J. Nicolau and F. I. Riedlinger. Estimation and inference in multivariate Markov chains. Statistical Papers, 56(4): 1163–1173, 2014. DOI 10.1007/s00362-014-0630-6.

G. Pegram. An Autoregressive Model for Multilag Markov Chains. Journal of Applied Probability, 17(2): 350–362, 1980. DOI 10.2307/3213025.

A. Raftery. A Model for High-Order Markov Chains. Journal of the Royal Statistical Society: Series B (Methodological), 47(3): 528–539, 1985. DOI 10.1111/j.2517-6161.1985.tb01383.x.

A. Raftery and S. Tavaré. Estimation and Modelling Repeated Patterns in High Order Markov Chains with the Mixture Transition Distribution Model. Applied Statistics, 43(1): 179–199, 1994. DOI 10.2307/2986120.

M. B. Rajarshi. Statistical inference for discrete time stochastic processes. SpringerBriefs in Statistics, 2013. URL http://www.springer.com/978-81-322-0762-7.

M. H. Regier. A Two-State Markov Model for Behavioral Change. Journal of the American Statistical Association, 63(323): 993–999, 1968. DOI 10.1080/01621459.1968.11009325.

T. K. Siu, W. K. Ching, E. S. Fung and M. K. Ng. On a multivariate Markov chain model for credit risk measurement. Quantitative Finance, 5(6): 543–556, 2005. DOI 10.1080/14697680500383714.

G. A. Spedicato. Discrete time markov chains with r. The R Journal, 2017. URL https://journal.r-project.org/archive/2017/RJ-2017-036/index.html. R package version 0.6.9.7.

S. Spilerman and B. Singer. The Representation of Social Processes by Markov Models. American Journal of Sociology, 82(1): 1–54, 1976. URL https://www.jstor.org/stable/2777460.

R. Tian and G. Shen. Predictive power of markovian models: Evidence from US recession forecasting. Journal of Forecasting, 38(6): 525–551, 2019. DOI https://doi.org/10.1002/for.2579.

R. Varadhan. Alabama: Constrained nonlinear optimization. 2015. URL https://CRAN.R-project.org/package=alabama. R package version 2015.3-1.

W. N. Venables and B. D. Ripley. Modern applied statistics with s. Fourth New York: Springer, 2002. URL https://www.stats.ox.ac.uk/pub/MASS4/. ISBN 0-387-95457-0.

C. Wang, T. Z. Huang and W. K. Ching. A new multivariate Markov chain model for adding a new categorical data sequence. Mathematical Problems in Engineering, 2014: 2014. DOI 10.1155/2014/502808.

S. Wasserman. Analyzing social networks as stochastic processes. Journal of the American Statistical Association, 75(370): 280–294, 1980. DOI 10.1080/01621459.1980.10477465.

H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org.

C. S. Wong and W. K. Li. On a mixture autoregressive conditional heteroscedastic model. Journal of the American Statistical Association, 96(455): 982–995, 2001. DOI 10.1198/016214501753208645.

X. Zhang, M. L. King and R. J. Hyndman. A Bayesian approach to bandwidth selection for multivariate kernel density estimation. Computational Statistics and Data Analysis, 50(11): 3009–3031, 2006. DOI 10.1016/j.csda.2005.06.019.

D. M. Zhu and W. K. Ching. A new estimation method for multivariate Markov chain model with application in demand predictions. Proceedings - 3rd International Conference on Business Intelligence and Financial Engineering, BIFE 2010, 126–130, 2010. DOI 10.1109/BIFE.2010.39.

	Hypothesis	Test
Power	\(H_0: \lambda_{11} = 0\)	\(\frac{\widehat{\lambda}_{11}^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}\)
	\(H_0: \lambda_{12} = 1\)	\(\frac{(\widehat{\lambda}_{12}-1)^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}\)
Dimension	\(H_0: \lambda_{11} = 1\)	\(\frac{(\widehat{\lambda}_{11}-1)^2}{se(\widehat{\lambda}_{11})^2} \sim \chi^2_{(1)}\)
	\(H_0: \lambda_{12} = 0\)	\(\frac{\widehat{\lambda}_{12}^2}{se(\widehat{\lambda}_{12})^2} \sim \chi^2_{(1)}\)

GenMarkov: Modeling Generalized Multivariate Markov Chains in R