The Autoregressive Distributed Lag approach to cointegration or bound testing, proposed by Pesaran in 2001, has become prominent in empirical research. Although this approach has many advantages over the classical cointegration tests, it is not exempt from drawbacks, such as possible inconclusive inference and distortion in size. Recently, Bertelli and coauthors developed a bootstrap approach to the bound tests to overcome these drawbacks. This paper introduces the R package bootCT, which implements this method by deriving the bootstrap versions of the bound tests and of the asymptotic F-test on the independent variables proposed by Sam and coauthors in 2019. As a spinoff, a general method for generating random multivariate time series following a given VECM/ARDL structure is provided in the package. Empirical applications showcase the main functionality of the package.
Cointegration and error correction are fundamental concepts in the
analysis of economic data, insofar as they provide an appropriate
framework for testing economic hypotheses about growth and fluctuation.
Several approaches have been proposed in the literature to determine
whether two or more non-stationary time series are cointegrated, meaning
they share a common long-run relationship.
There are two basic types of tests for cointegration: single equation
tests and VAR-based tests. The former check the presence of unit roots
in cointegration residuals (see, e.g., Engle and Granger 1987; Engle and Yoo 1987; Mackinnon 1991; Gabriel et al. 2002; Cook 2006)
or test the significance of the error-correction (EC) term coefficient
(Kremers et al. 1992; Maddala and Kim 1998; Arranz and Escribano 2000; Ericsson and MacKinnon 2002). The
latter, such as the Johansen (1991) approach, tackle the problem of
detecting cointegrating relationships in a VAR model. This latter
approach, albeit having the advantage of avoiding the issue of
normalization, as well as allowing the detection of multiple
cointegrating vectors, is far from being perfect. In the VAR system all
variables are treated symmetrically, as opposed to the standard
univariate models that usually have a clear interpretation in terms of
exogenous and endogenous variables. Furthermore, in a VAR system all the
variables are estimated at the same time, which is problematic if the
relation between some variables is flawed, that is affected by some
source of error. In this case a simultaneous estimation process tends to
propagate the error affecting one equation to the others. Furthermore, a
multidimensional VAR models employs plenty of degrees of freedom.
The recent cointegration approach, known as Autoregressive Distributed
Lag (ARDL) approach to cointegration or bound testing, proposed by
Pesaran et al. (2001) (PSS), falls in the former strand of literature. It has
become prominent in empirical research because it shows several
advantages with respect to traditional methods for testing
cointegration. First, it is applicable also in cases of mixed order
integrated variables, albeit with integration not exceeding the first
order. Thus, it evades the necessity of pre-testing the variables and,
accordingly, avoids some common practices that may prevent finding
cointegrating relationships, such as dropping variables or transforming
them into stationary form (see McNown et al. 2018). Second,
cointegration bound tests are performed in an ARDL model that allows
different lag orders for each variable, thus providing a more flexible
framework than other commonly employed approaches. Finally, unlike other
cointegration techniques, which are sensitive to the sample size, the
ARDL approach provides robust and consistent results for small sample
sizes.
Notably, the ARDL bound testing methodology has quickly spread in
economics and econometrics to study the cointegrating relationships
between macroeconomic and financial variables, to evaluate the long-run
impact of energy variables, or to assess recent environmental policies
and their impact on the economy. Among the many applications, see for
instance Haseeb et al. (2019; Hussain et al. 2019; Menegaki 2019; Reda and Nourhan 2020; Yilanci et al. 2020; Abbasi et al. 2021).
The original bound tests proposed by Pesaran et al. (2001) are an \(F\)-test for
the significance of the coefficients of all lagged level variables
entering the error correction term (\(F_{ov}\)), and a \(t\)-test for the
coefficient of the lagged dependent variable. When either the dependent
or the independent variables do not appear in the long-run relationship,
a degenerate case arises. The bound \(t\)-test provides answers on the
occurrence of a degenerate case of second type, while the occurrence of
a degeneracy case of first type can be assessed by testing whether the
dependent variable is of integration order I(1). This type of check
violates the spirit and motivation of the bound tests, which are
supposed to be applicable in situations of unknown order of integration
for the variables.
Recently, McNown et al. (2018) pointed out how, due to the low power
problem of unit root tests, investigating the presence of a first type
degeneracy by testing the integration order of the dependent variable
may lead to incorrect conclusions. Therefore, they suggested checking
for its occurrence by testing the significance of the lagged levels of
the independent variables via an extra \(F\)-test (\(F_{ind}\)), which was
also worked out in its asymptotic version [SMK; Sam et al. (2019)].
Besides problems in testing the occurrence of degenerate cases, in
general, the main drawback of the bound tests is the occurrence of
potentially inconclusive results, if the test statistic lies between the
bounds of the test distribution under the null. Furthermore, the
asymptotic distributions of the statistics may provide a poor
approximation of the true distributions in small samples. Finite sample
critical values, even if only for a subset of all possible model
specifications, have been worked out in the literature (see Mills and Pentecost 2001; Narayan and Smyth 2004; Kanioura and Turner 2005; Narayan 2005),
while (Kripfganz and Schneider 2020) provided the quantiles of the asymptotic
distributions of the tests as functions of the sample size, the lag
order and the number of long-run forcing variables. However, this
relevant improvement does not eliminate the uncertainty related to the
inconclusive regions, or the existence of other critical issues related
to the underlying assumptions of the bound test framework, such as the
(weak) exogeneity of the independent variables or the non-stationarity
of the dependent variable.
To overcome the mentioned bound test drawbacks, (Bertelli et al. 2022)
proposed bootstrapping the ARDL cointegration test. Inference can always
be pursued with ARDL bootstrap tests, unlike what happens with both the
PSS tests and the SMK test on the independent variables. Bootstrap ARDL
tests were first put forward by (McNown et al. 2018) in an
unconditional ARDL model, which omits the instantaneous differences of
the exogenous variables in the ARDL equation, rather than a conditional
one, as originally proposed by (Pesaran et al. 2001). The unconditional model
is often used, for reason of practical convenience, in empirical
research. Simulation results in (Bertelli et al. 2022) have
highlighted the importance of employing the appropriate specification,
especially under degenerate cases. In fact, it has been pointed out that
a correct detection of these cases requires the comparison of the test
outcomes in both the conditional and unconditional settings. Erroneous
conclusions, based exclusively on one model specification, can thus be
avoided.
In this paper, bootstrap bound tests, thereby including the bootstrap
versions of the \(F_{ov}\), \(t\) and \(F_{ind}\) bound tests, are carried out
in a conditional ARDL model setting. This approach allows to overcome
the problem of inconclusive regions of the standard bound tests. A
comparison with the outcomes engendered by the unconditional ARDL
bootstrap tests is nevertheless provided for the \(F_{ind}\) test, to
avoid erroneous inference in presence of degenerate cases.
The paper is organized as follows. Section 2
introduces the theoretical results of the ARDL cointegration bound
tests. Section 3 details the steps carried out by the
bootstrap procedure, which allows the construction of the (bootstrap)
distribution - under the null - for the \(F_{ov}\), \(t\), conditional
\(F_{ind}\) and unconditional \(F_{ind}\) tests. Section 4
introduces the R
package
bootCT (Vacca and Bertelli 2023) and
its functionalities: a method for the generation of random multivariate
time series that follow a user-specified VECM/ARDL structure, with some
examples, and the main function that carries out the aforementioned
bootstrap tests, while also computing the PSS and SMK bound tests. The
trade-off between accuracy and computational time of the bootstrap
procedure is also investigated, under several scenarios in terms of
sample size and number of replications. Notably, a function that
performs the PSS bound tests is already available in the
dynamac package
(Jordan and Philips 2020), while no R
routine has so far been implemented for the
SMK test, to the best of our knowledge. Section 5 gives some
empirical applications that employ the core function of the package and
its possible outputs. Section 6 concludes. Appendix
7 briefly delves into technical details of the
conditional ARDL model and its possible specifications 1.
The starting point of the approach proposed by (Pesaran et al. 2001) is a
\((K+1)\) VAR(\(p\)) model
\[
\mathbf{A}(L)(\mathbf{z}_t-\boldsymbol{\mu}-\boldsymbol{\eta}t)=\boldsymbol{\varepsilon}_t \enspace \enspace \enspace \boldsymbol{\varepsilon}_t\sim N(\mathbf{0}, \boldsymbol{\Sigma}),\qquad\mathbf{A}(L)=\left(\mathbf{I}_{K+1}- \sum_{j=1}^{p}\mathbf{A}_j\mathbf{L}^j\right)
\enspace \enspace \enspace t=1,2,\dots,T. \tag{1}\]
Here, \(\mathbf{A}_j\) are square \((K+1)\) matrices, \(\mathbf{z}_t\) a
vector of \((K+1)\) variables, \(\boldsymbol{\mu}\) and \(\boldsymbol{\eta}\)
are \((K+1)\) vectors representing the drift and the trend respectively,
and \(\det(\mathbf{A}(z))=0\) for \(|z| \geq 1\). If the matrix
\(\mathbf{A}(1)=\mathbf{I}_{K+1}-\sum_{j=1}^{p}\mathbf{A}_{j}\) is
singular, the components of \(\mathbf{z}_t\) turn out to be integrated and
possibly cointegrated.
The VECM representation of (1) is given by (see Appendix
7.1 for details)
\[
\Delta\mathbf{z}_t=\boldsymbol{\alpha}_{0}+\boldsymbol{\alpha}_{1}t-\mathbf{A}(1)\mathbf{z}_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\Gamma}_{j}\Delta \mathbf{z}_{t-j}+\boldsymbol{\varepsilon}_t. \tag{2}\]
Now, to study the adjustment to the equilibrium of a single variable
\(y_t\), given the other \(\mathbf{x}_t\) variables, the vectors
\(\mathbf{z}_t\) and \(\boldsymbol{\varepsilon}_t\) are partitioned
\[
\mathbf{z}_t=\begin{bmatrix}
\underset{(1,1)}{y_{t}} \\ \underset{(K,1)}{\mathbf{x}_{t}}
\end{bmatrix}, \enspace \enspace \enspace \boldsymbol{\varepsilon}_t=\begin{bmatrix}
\underset{(1,1)}{\varepsilon_{yt}} \\ \underset{(K,1)}{\boldsymbol{\varepsilon}_{xt}}
\end{bmatrix}. \tag{3}\]
The matrix \(\mathbf{A}(1)\), which is assumed to be singular to allow
cointegration, is partitioned conformably to \(\mathbf{z}_{t}\) as 2
\[\mathbf{A}(1)=\begin{bmatrix}
\underset{(1,1)}{a_{yy}} & \underset{(1,K)}{\mathbf{a}_{yx}'} \\ \underset{(K,1)}{\mathbf{a}_{xy}} & \underset{(K,K)}{\mathbf{A}_{xx}}
\end{bmatrix}.\]
Under the assumption
\[
\boldsymbol{\varepsilon}_t \sim N\Bigg(\mathbf{0}, \begin{bmatrix}
\underset{(1,1)}{\sigma_{yy}}& \underset{(1,K)}{\boldsymbol{\sigma}_{yx}'} \\ \underset{(K,1)}{\boldsymbol{\sigma}_{xy}} & \underset{(K,K)}{\boldsymbol{\Sigma}_{xx}} \end{bmatrix}\Bigg), \tag{4}\]
the following holds
\[
\varepsilon_{yt}=\boldsymbol{\omega}'\boldsymbol{\varepsilon}_{xt}+\nu_{yt} \sim N(0,\sigma_{y.x}), \tag{5}\]
where
\(\sigma_{y.x}=\sigma_{yy}-\boldsymbol{\omega}'\boldsymbol{\sigma}_{xy}\)
with
\(\boldsymbol{\omega}'=\boldsymbol{\sigma}'_{yx}\boldsymbol{\Sigma}^{-1}_{xx}\),
and \(\nu_{yt}\) is independent of \(\boldsymbol{\varepsilon}_{xt}\).
Substituting (5) into (2) and assuming that
the \(\mathbf{x}_{t}\) variables are exogenous towards the ARDL parameters
(that is, setting \(\mathbf{a}_{xy}=\mathbf{0}\) in \(\mathbf{A}(1)\))
yields the system (see Appendix 7.1 for details)
\[
\Delta y_{t}=\alpha_{0.y}+\alpha_{1.y}t -a_{yy}EC_{t-1}+ \sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt} \tag{6}\]
\[ \Delta\mathbf{x}_{t} = \boldsymbol{\alpha}_{0x} +\boldsymbol{\alpha}_{1x}t+ \mathbf{A}_{(x)}\mathbf{z}_{t-1}+ \boldsymbol{\Gamma}_{(x)}(L)\Delta\mathbf{z}_t+ \boldsymbol{\varepsilon}_{xt}, \tag{7}\] where \[ \boldsymbol\gamma_{y.x,j}'=\boldsymbol\gamma_{y,j}'-\boldsymbol{\omega}'\boldsymbol{\Gamma}_{(x),j} \tag{8}\]
\[
\alpha_{0.y}=\alpha_{0y}-\boldsymbol{\omega}'\boldsymbol{\alpha}_{0x}, \enspace \enspace \enspace \alpha_{1.y}=\alpha_{1y}-\boldsymbol{\omega}'\boldsymbol{\alpha}_{1x}, \tag{9}\]
and where the error correction term, \(EC_{t-1}\), expressing the long-run
equilibrium relationship between \(y_{t}\) and \(\mathbf{x}_{t}\), is given
by
\[
EC_{t-1}=y_{t-1}-\theta_{0}-\theta_{1}t-\boldsymbol{\theta}'\mathbf{x}_{t-1}, \tag{10}\]
with
\[
\theta_{0}=\mu_{y}-\boldsymbol{\theta}'\boldsymbol{\mu}_{x}, \enspace \theta_{1}=\eta_{y}-\boldsymbol{\theta}'\boldsymbol{\eta}_{x}, \enspace\boldsymbol{\theta}'=-\frac{\widetilde{\mathbf{a}'}_{y.x}}{a_{yy}}=-\frac{\mathbf{a}_{yx}'-\boldsymbol{\omega}'\mathbf{A}_{xx}}{a_{yy}}. \tag{11}\]
Thus, no cointegration occurs when
\(\widetilde{\mathbf{a}}_{y.x}=\mathbf{0}\) or \(a_{yy}=0\) . These two
circumstances are referred to as degenerate case of second and first
type, respectively. Degenerate cases imply no cointegration between
\(y_{t}\) and \(\mathbf{x}_{t}\).
To test the hypothesis of cointegration between \(y_{t}\) and
\(\mathbf{x}_{t}\), Pesaran et al. (2001) proposed an \(F\)-test, \(F_{ov}\) hereafter,
based on the hypothesis system
\[\begin{align}
H_0: a_{yy}=0 \; \cap \;\widetilde{\mathbf{a}}_{y.x}=\mathbf{0}\\
H_1: a_{yy} \neq 0 \; \cup \;\widetilde{\mathbf{a}}_{y.x}\neq \mathbf{0}.\tag{12}
\end{align}\]
Note that \(H_{1}\) covers also the degenerate cases
\[\begin{align}
H_1^{y.x}: a_{yy}=0 \; , \;\widetilde{\mathbf{a}}_{y.x}\neq\mathbf{0}\\
H_1^{yy}: a_{yy} \neq 0 \; , \;\widetilde{\mathbf{a}}_{y.x} = \mathbf{0}.\tag{13}
\end{align}\]
The exact distribution of the \(F\) statistic under the null is unknown,
but it is limited from above and below by two asymptotic distributions:
one corresponding to the case of stationary regressors, and another
corresponding to the case of first-order integrated regressors. As a
consequence, the test is called bound test and has an inconclusive area.
3
Pesaran et al. (2001) worked out two sets of (asymptotic) critical values: one,
\(\{\tau_{L,F}\}\), for the case when \(\mathbf{x}_{t}\sim{I}(0)\) and
another, \(\{\tau_{U,F}\}\), for the case when \(\mathbf{x}_{t}\sim{I}(1)\).
These values vary in accordance with the number of regressors in the
ARDL equation, the sample size and the assumptions made about the
deterministic components (intercept and trend) of the data generating
process.
In this regard, Pesaran et al. (2001) introduced five different specifications
for the ARDL model, depending on its deterministic components, which are
(see Appendix 7.2 for details)
No intercept and no trend
\[\begin{align}
\Delta y_t=-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt},\tag{14}
\end{align}\]
where \(EC_{t-1}=y_{t-1}-\boldsymbol{\theta}'\mathbf{x}_{t-1}\),
Restricted intercept and no trend
\[\begin{align}
\Delta y_{t}=
-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt},\tag{15}
\end{align}\]
where
\(EC_{t-1}=y_{t-1}-\theta_{0}-\boldsymbol{\theta}'\mathbf{x}_{t-1}\).
The intercept extracted from the EC term is
\(\alpha_{0.y}^{EC} = a_{yy}\theta_0\).
Unrestricted intercept and no trend
\[\begin{align}
\Delta y_{t}
=\alpha_{0.y}-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt},\tag{16}
\end{align}\]
where \(EC_{t-1}=y_{t-1}-\boldsymbol{\theta}'\mathbf{x}_{t-1}\).
Unrestricted intercept, restricted trend
\[\begin{align}
\Delta y_{t}=
\alpha_{0.y}-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt},\tag{17}
\end{align}\]
where
\(EC_{t-1}=y_{t-1}-\theta_{1}t-\boldsymbol{\theta}'\mathbf{x}_{t-1}\).
The trend extracted from the EC term is
\(\alpha_{1.y}^{EC} = a_{yy}\theta_1\).
Unrestricted intercept, unrestricted trend
\[\begin{align}
\Delta y_{t}
=\alpha_{0.y}+\alpha_{1.y}t
-a_{yy}EC_{t-1}+\sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{y.x,j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\omega}'\Delta\mathbf{x}_{t}+\nu_{yt}, \tag{18}
\end{align}\]
where \(EC_{t-1}=y_{t-1}-\boldsymbol{\theta}'\mathbf{x}_{t-1}\).
The model in (6) proposed by Pesaran et al. (2001) represents the
correct framework in which to carry out bound tests. However, bound test
are often performed in an unconditional ARDL model setting, specified as
\[
\Delta y_{t}=\alpha_{0.y}+\alpha_{1.y}t -a_{yy}EC_{t-1}+ \sum_{j=1}^{p-1}\boldsymbol{\gamma}'_{j}\Delta\mathbf{z}_{t-j}+\varepsilon_{yt}, \tag{19}\]
which omits the term \(\boldsymbol{\omega}'\Delta\mathbf{x}_{t}\).
(Bertelli et al. 2022) have highlighted that bootstrap tests performed
in these two ARDL specifications can lead to contrasting results. To
explain this divergence, note that the conditional model makes use of
the following vector in the EC term
\[\widetilde{\mathbf{a}}_{y.x}'=\mathbf{a}_{yx}'-\boldsymbol{\omega}'\mathbf{A}_{xx}\]
(divided by \(a_{yy}\), see (11)) to carry out bound tests,
while the unconditional one only uses the vector \(\mathbf{a}_{yx}'\),
(divided by \(a_{yy}\)), since it neglects the term
\(\boldsymbol{\omega}'\mathbf{A}_{xx}\). 4 This can lead to contrasting
inference in two instances. The first happens when a degeneracy of first
type occurs in the conditional model, that is
\[
\widetilde{\mathbf{a}}_{y.x}'=\mathbf{0}, \tag{20}\]
because
\[\mathbf{a}_{yx}'=\boldsymbol{\omega}'\mathbf{A}_{xx}.\]
In this case, the conditional model rejects cointegration, while the
unconditional one concludes the opposite. The other case happens when a
degeneracy of first type occurs in the unconditional model, that is
\[
\mathbf{a}_{yx}'=\mathbf{0}, \tag{21}\]
but
\[\widetilde{\mathbf{a}}_{y.x}'=\boldsymbol{\omega}'\mathbf{A}_{xx} \neq \mathbf{0}.\]
In this case, the unconditional model rejects cointegration, while the
conditional one concludes for the existence of cointegrating
relationships, which are however spurious. Only a comparison of the
outcomes of the \(F_{ind}\) test performed in both the conditional and
unconditional ARDL equation can help to disentangle this problem. 5
In the following, bootstrap tests are carried out in the conditional
ARDL model (6). However, when a degeneracy of first type
occurs in the unconditional model, the outcomes of the \(F_{ind}\)
bootstrap test performed in both the conditional and unconditional
settings are provided. This, as previously outlined, is performed to
avoid the acceptance of spurious long-run relationships among the
dependent variable and the independent variables.
The bootstrap procedure here proposed focuses on a ARDL model specified
as in (14)-(18), depending on the assumptions on
the deterministic components.
The bootstrap procedure consists of the following steps:
The ARDL model is estimated via OLS and the related test statistics \(F_{ov}\), \(t\) or \(F_{ind}\) are computed.
In order to construct the distribution of each test statistic under the corresponding null, the same model is re-estimated imposing the appropriate restrictions on the coefficients according to the test under consideration.
Following (McNown et al. 2018), the ARDL restricted residuals are then computed. For example, under Case III, the residuals are \[ \widehat{\nu}_{yt}^{F_{ov}}=\Delta y_{t}-\widehat{\alpha}_{0.y}-\sum_{j=1}^{p-1}\widehat{\boldsymbol{\gamma}}_{y.x,j}'\Delta\mathbf{z}_{t-j}-\widehat{\boldsymbol{\omega}}'\Delta\mathbf{x}_{t} \tag{22}\]
\[ \widehat{\nu}_{yt}^{t}=\Delta y_{t}-\widehat{\alpha}_{0.y}+\widehat{\widetilde{\mathbf{a}}}'_{y.x}\mathbf{x}_{t-1}- \sum_{j=1}^{p-1}\widehat{\boldsymbol{\gamma}}_{y.x,j}'\Delta\mathbf{z}_{t-j}-\widehat{\boldsymbol{\omega}}'\Delta\mathbf{x}_{t} \tag{23}\]
\[ \widehat{\nu}_{yt}^{F_{ind}}=\Delta y_{t}-\widehat{\alpha}_{0.y}+\widehat{a}_{yy}y_{t-1}- \sum_{j=1}^{p-1}\widehat{\boldsymbol{\gamma}}_{y.x,j}'\Delta\mathbf{z}_{t-j}-\widehat{\boldsymbol{\omega}}'\Delta\mathbf{x}_{t}. \tag{24}\] Here, the apex \("\widehat{\,\,.\,\,}"\) denotes the estimated parameters. The other cases can be dealt with in a similar manner.
The VECM model
\[ \Delta\mathbf{z}_{t}=\boldsymbol{\alpha}_{0}-\mathbf{A}\mathbf{z}_{t-1}+ \sum_{j=1}^{p-1}\boldsymbol{\Gamma}_{j}\Delta\mathbf{z}_{t-j}+\boldsymbol{\varepsilon}_{t} \tag{25}\] is estimated as well (imposing weak exogeneity), and the residuals
\[ \widehat{\boldsymbol{\varepsilon}}_{xt}= \Delta\mathbf{x}_{t}-\widehat{\boldsymbol{\alpha}}_{0x}+\widehat{\mathbf{A}}_{xx}\mathbf{x}_{t-1}- \sum_{j=1}^{p-1}\widehat{\boldsymbol{\Gamma}}_{(x)j}\Delta\mathbf{z}_{t-j} \tag{26}\] are computed. Thsis approach guarantees that the residuals \(\widehat{\boldsymbol{\varepsilon}}_{xt}\), associated to the variables \(\mathbf{x}_{t}\) explained by the marginal model (7), are uncorrelated with the ARDL residuals \(\widehat{\nu}_{yt}^{.}\).
A large set of \(B\) bootstrap replicates are sampled from the residuals calculated as in (22),(23), (24) and (26). In each replication, the following operations are carried out:
Each set of \((T-p)\) resampled residuals (with replacement) \(\widehat{\boldsymbol{\nu}}_{zt}^{(b)}=(\widehat{\nu}_{yt}^{(b)},\widehat{\boldsymbol{\varepsilon}}_{xt}^{(b)})\) is re-centered (see Davidson and MacKinnon 2005) \[\begin{align} \dot{\widehat{\nu}}^{(b)}_{yt}&=\widehat{\nu}^{(b)}_{yt} -\frac{1}{T-p}\sum_{t=p+1}^{T}\widehat{\nu}^{(b)}_{yt} \tag{27} \\ \dot{\widehat{\boldsymbol{\varepsilon}}}^{b}_{x_{i}t}&=\widehat{\boldsymbol{\varepsilon}}^{(b)}_{x_{i}t}-\frac{1}{T-p}\sum_{t=p+1}^{T}\widehat{\boldsymbol{\varepsilon}}^{(b)}_{x_{i}t}\qquad i=1,\dots,K.\tag{28} \end{align}\]
A sequential set of \((T-p)\) bootstrap observations,
\(y^{*}_{t}\enspace, \mathbf{x}^{*}_{t}\enspace t=p+1,\dots,T\),
is generated as follows
\[y^{*}_{t}=y^{*}_{t-1}+\Delta y^{*}_{t}, \enspace \enspace \mathbf{x}^{*}_{t}=\mathbf{x}^{*}_{t-1}+\Delta \mathbf{x}^{*}_{t},\]
where \(\Delta \mathbf{x}^{*}_{t}\) are obtained from
(26) and \(\Delta y^{*}_{t}\) from either
(22), (23) or (24) after
replacing in each of these equations the original residuals with
the bootstrap ones.
The initial conditions, that is the observations before \(t=p+1\),
are obtained by drawing randomly \(p\) observations in block from
the original data, so as to preserve the data dependence
structure.
An unrestricted ARDL model is estimated via OLS using the bootstrap observations, and the statistics \(F_{ov}^{(b),H_0}\), \(t^{(b),H_0}\) \(F_{ind}^{(b),H_0}\) are computed.
The bootstrap distributions of
\(\big\{F_{ov}^{(b),H_0}\big\}_{b=1}^B\),
\(\big\{F_{ind}^{(b),H_0}\big\}_{b=1}^B\) and
\(\big\{t^{(b),H_0}\big\}_{b=1}^B\) under the null are then employed
to determine the critical values of the tests. By denoting with
\(M^*_b\) the ordered bootstrap test statistic, and with \(\alpha\) the
nominal significance level, the bootstrap critical values are
determined as follows
\[
c^*_{\alpha,M}=\min\bigg\{c:\sum_{b=1}^{B}\mathbf{1}_{\{M^*_b >c\}} \leq\alpha\bigg\}
\qquad M\in\{F_{ov},F_{ind}\} \tag{29}\]
for the \(F\) tests and
\[
c^*_{{\alpha,t}}=\max\bigg\{c:\sum_{b=1}^{B}\mathbf{1}_{\{t^*_b<c\}} \leq {\alpha}\bigg\} \tag{30}\]
for the \(t\) test.
Here, \(\mathbf{1}_{\{x \in A\}}\) is the indicator function, which is
equal to one if the condition in subscript is satisfied and zero
otherwise.
The null hypothesis is rejected if the \(F\) statistic computed at step 1, \(F_{ov}\) or \(F_{ind}\), is greater than the respective \(c^*_{\alpha,M}\), or if the \(t\) statistic computed at the same step is lower than \(c^*_{{\alpha,t}}\).
This section describes the main functionalities of the
bootCT package. The
functions included in the package are essentially of two types. The
function sim_vecm_ardl
generates data according to a given data
generating process (DGP), assuming either the presence or the absence of
cointegrating relationships between variables, or degenerate cases. The
function boot_ardl
tests the presence of cointegrating relationships
employing the Pesaran ARDL bound tests (\(F_{ov}\) and \(t\)), the SMK bound
test on lagged independent variables (\(F_{ind}\)), and the novel ARDL
bootstrap testing procedure.
sim_vecm_ardl
functionThe function sim_vecm_ardl
allows to simulate a multivariate time
series from a given conditional ARDL specification for a dependent
variable \(y_t\) and a VAR/VECM specification for the remaining
independent variables \(\mathbf{x}_t\). In sthis regard, it represents an
interesting addition to extant data generating procedures for VAR/VECM
models. The arguments of this function can be divided into two
subgroups.
A group of parameters pertains the VECM model (6) and
(7), with \(\mathbf{A}_{xx}\) identifying the matrix of the
long-run relationships among the \(\mathbf x_t\) variables, and
\(\boldsymbol\Gamma_j\)’s, \(j=1,...,p-1\) the short-run matrices of the
system variables. Additionally, the parameter \(a_{yy}\) weighs the EC
term for \(y_t\), while \(\mathbf{a}_{yx}'\) is the parameter vector
weighting the variables \(\mathbf{x}_{t}\) in the ARDL equation. The
vector \(\mathbf{a}_{yx}'\), after conditioning \(y_t\) on the other
variables (\(\mathbf{x}_t\), see model (6)) becomes
\(\widetilde{\mathbf{a}}_{y.x}' = \mathbf{a}_{yx}' - \boldsymbol\omega'\mathbf{A}_{xx}\).
The second group of parameters concerns the model intercept and trend of
the VAR specification, \(\boldsymbol\mu\) and \(\boldsymbol\eta\), which in
the VECM representation become
\(\boldsymbol\alpha_0 = \mathbf A \boldsymbol\mu + (\mathbf I_{K+1} - \sum_{i=1}^{p-1} \boldsymbol\Gamma_j-\mathbf A)\boldsymbol\eta\)
and \(\boldsymbol{\alpha}_1 = \mathbf A \boldsymbol\eta\) and in the
conditional ARDL become
\(\alpha_{0.y}^{EC}= a_{yy}(\mu_{y}-\widetilde{\mathbf{ a}}_{y.x}'\boldsymbol \mu_{x})+\boldsymbol\gamma_{y.x}'(1)\boldsymbol\eta\)
and
\(a_{1.y}^{EC}=a_{yy}(\eta_y-\widetilde{\mathbf{ a}}_{y.x}'\boldsymbol\eta_x)\).
As explained in Appendix 7.2, intercept and trend
appear in the error correction (EC) term of the ARDL equation only when
restricted. Accordingly, they both do not appear in the EC in the case
I, the intercept does not appear in the EC term in cases III, IV and V
(it is freely set to \(\alpha_{0.y}\)) while the trend appears in the EC
term only in the case IV (it is freely set to \(\alpha_{1.y}\) for case
V). Accordingly, when these terms are not restricted, they need to be
supplied by the user.
The approach used to specify the function inputs offers great control to
the user, in terms of generating specific (conditional) ARDL-based
cointegration structures.
The function sim_vecm_ardl
takes the following arguments:
nobs
: number of observations to generate;
case
: indicates the conditional ARDL specification in terms of
deterministic component (intercept and trend) among the five
specifications proposed by Pesaran et al. (2001), given in
(14)-(18).
sigma.in
: covariance matrix, \(\boldsymbol{\Sigma}\), of the error
term \(\boldsymbol{\varepsilon}_{t}\);
gamma.in
: list of short-run parameter matrices
\(\boldsymbol\Gamma_j\);
axx.in
: cointegrating relationships, \(\mathbf{A}_{xx}\), pertaining
the independent variables in the marginal VECM model;
ayx.uc.in
: vector of parameters, as in \(\mathbf{a}_{yx}\);
ayy.in
: the \(a_{yy}\) term, weighting the EC term in the ARDL
equation;
mu.in
: mean vector, \(\boldsymbol\mu\), in the starting VAR
specification, used to define the VECM intercept for CASE II;
eta.in
: trend vector, \(\boldsymbol\eta\), in the starting VAR
specification, used to define the VECM trend for case IV;
azero.in
: unrestricted intercept of the VECM specification (valid
only for cases III, IV and V), when the intercept is not involved in
the EC term;
aone.in
: unrestricted coefficient of the trend in the VECM
specification (valid only for case V), when the trend is not
involved in the EC term;
burn.in
: additional observations burn-in observations to be
generated. A total of burn.in + nobs
observations are generated,
but only the last nobs
are kept in the data;
seed.in
: seed number for the generation of
\(\boldsymbol{\varepsilon}_{t}\sim N(\mathbf 0,\boldsymbol\Sigma)\).
If parameter values for mu.in
, eta.in
, azero.in
, or aone.in
and
case number turn out to be in contradiction, an error message is
displayed.
As output, the function gives out a list containing the data, both in
level and first difference, along with all the parameter values given as
input. Additionally, all intermediate transformation of parameters via
VECM transformation or as a by-product of conditioning \(y_{t}\) on
\(\mathbf{x}_{t}\) are included in the output.
Figure 1 depicts three-time series, dep_1_0
,
ind_1_0
and ind_2_0
, generated using this function and
affected by a cointegrating relationship, one panel for each case, from
I to V. The variable dep_1_0
represents the dependent variable
\(y_t\) of the ARDL equation, while ind_1_0
and ind_2_0
the independent ones, \(x_{1t}\) and \(x_{2t}\).
The code used to generate the data for case I is the following:
= matrix(c( 0, 0, 0,
corrm 0.25, 0, 0,
0.4, -0.25, 0), nrow = 3, ncol = 3, byrow = T)
= (corrm + t(corrm)) + diag(3)
Corrm
= diag(c(1.3, 1.2, 1))
sds
= (sds %*% Corrm %*% t(sds))
sigma.in
= matrix(c(0.6, 0, 0.2,
gamma1 0.1, -0.3, 0,
0, -0.3, 0.2), nrow = 3, ncol = 3,byrow=T)
= gamma1 * 0.3
gamma2
= sigma.in[1, -1] %*% solve(sigma.in[-1, -1])
omegat = matrix(c( 0.3, 0.5,
axx.in -0.4, 0.3), nrow = 2, ncol = 2, byrow = T)
= c(0.4, 0.4)
ayx.uc.in = 0.6
ayy.in
=
data.vecm.ardl_1 sim_vecm_ardl(nobs = 200,
case = 1,
sigma.in = sigma.in,
gamma.in = list(gamma1, gamma2),
axx.in = axx.in,
ayx.uc.in = ayx.uc.in,
ayy.in = ayy.in,
mu.in = rep(0, 3),
eta.in = rep(0, 3),
azero.in = rep(0, 3),
aone.in = rep(0, 3),
burn.in = 100,
seed.in = 999)
Additionally, Figure 2 displays other three time
series, dep_1_0
(\(y_t\)), ind_1_0
(\(x_{1t}\)) and
ind_2_0
(\(x_{2t}\)), when a degeneracy of second type occurs
(\(a_{yy} = 0\)) in the long-run relationship in the ARDL equation of
dep_1_0
on ind_1_0
, ind_2_0
. The five panels
represents the behavior of these series in the Cases from I to V. It is
worth noting the different scenario implied by these cases: case III
depicts a trend for the \(y_t\) variable, case IV highlights the inclusion
of a trend in the cointegrating relationship, and case V exhibits a
quadratic trend in the \(y_t\) variable.
Finally, the flowchart in Figure 3 details the
internal steps of the function sim_vecm_ardl
and the data generation
workflow. There, it is specified how the parameters of the VAR, VECM and
ARDL equation are introduced. Attention is paid on whether the error
correction mechanism involves either intercept or trend (or both) via
the internal computation of the parameters \(\theta_0\) and \(\theta_1\)
(and thus \(\alpha_{0.y}^{EC}\) and \(\alpha_{1.y}^{EC}\)). When the EC term
does not involve intercept and/or trend, \(\boldsymbol\alpha_0\) and
\(\boldsymbol\alpha_1\) are supplied by the user, depending on the case
under study.
boot_ardl
functionThis function develops the bootstrap procedure detailed previously. As
an option in the initial estimation phase, it offers the possibility of
automatically choosing the best order for the lagged differences of all
the variables in the ARDL and VECM models. This is done by using several
criteria. In particular, AIC, BIC, AICc, \(R^2\) and \(R^2_{adj}\) are used
as lag selection criteria for the ARDL model, while the overall minimum
between AIC, HQIC, SC and FPE is used for the lag selection for the
VECM.
In particular, the auto_ardl
function in the package
ARDL (Natsiopoulos and Tzeremes 2021) selects
the best ARDL order in terms of the short-run parameter vectors
\(\boldsymbol\gamma_{y.x,j}\), while the VARselect
function in the
package vars (Pfaff 2008b)
selects the best VECM order in terms of the short-run parameter matrices
\(\boldsymbol\Gamma_{(x),j}\). Furthermore, the user can input a
significance threshold for the retention of single parameters in the
\(\boldsymbol\Gamma_j\) and in the \(\boldsymbol\gamma_{y.x,j}\) vectors.
The function boot_ardl
takes the following arguments:
data
: input dataset. Must contain a dependent variable and a set
of independent variables;
yvar
: name of the dependent variable enclosed in quotation marks.
If unspecified, the first variable in the dataset is used;
xvar
: vector of names of the independent variables, each enclosed
in quotation marks. If unspecified, all variables in the dataset
except the first are used;
fix.ardl
: vector \((j_1,\dots,j_K)\), containing the maximum orders
of the lagged differences (i.e.,
\(\Delta y_{t-j_1}, \Delta x_{1,t-j_2},\dots,\) \(\Delta x_{1,t-j_K}\))
for the short term part of the ARDL equation, chosen in advance;
info.ardl
: (alternatively to fix.ardl
) the information criterion
used to choose the best lag order for the short term part of the
ARDL equation. It must be one between AIC
(default), AICc
,
BIC
, R2
, , adjR2
;
fix.vecm
: scalar \(m\) containing the maximum order of the lagged
differences (i.e., \(\Delta\mathbf z_{t-m}\)) for the short term part
of the VECM equation, chosen in advance;
info.vecm
: (alternatively to fix.vecm
) the information criterion
used to choose the best lag order for the short term part of the
VECM equation. Must be one among AIC
(default), HQIC
, SC
,
FPE
;
maxlag
: (in conjunction with info.ardl
/ info.vecm
) maximum
number of lags for the auto_ardl
function in the package
ARDL, and for the
VARselect
function in the package
vars;
a.ardl
: significance threshold for the short-term ARDL
coefficients (\(\boldsymbol\gamma_{y.x,j}\)) in the ARDL model
estimation;
a.vecm
: significance threshold for the short-term VECM
coefficients (in \(\boldsymbol\Gamma_j\)) in the VECM model
estimation;
nboot
: number of bootstrap replications;
case
: type of the specification for the conditional ARDL in terms
of deterministic components (intercept and trend) among the five
proposed by (Pesaran et al. 2001), given in
(14)-(18);
a.boot.H0
: probability/ies \(\alpha\) by which the critical
quantiles of the bootstrap distribution(s) \(c^{*}_{\alpha,F_{ov}}\),
\(c^{*}_{\alpha,t}\) and \(c^{*}_{\alpha,F_{ind}}\) must be calculated;
print
: if set to TRUE
, shows the progress bar.
boot_ardl
makes use of the lag_mts
function which produces lagged
versions of a given matrix of time series, each column with a separate
order. lag_mts
takes as parameters the data included in a matrix X
and the lag orders in a vector k
, with the addition of a boolean
parameter last.only
, which allows to specify whether only the \(k\)-th
order lags have to be retained, or all the lag orders from the first to
the \(k\)-th.
boot_ardl
also acts as a wrapper for the most common methodologies
detecting cointegration, offering a comprehensive view on the testing
procedures involved in the analysis. The resulting object, of class
bootCT
, contains all the information about
The conditional ARDL model estimates, and the unconditional VECM model estimates;
the bootstrap tests performed in the conditional ARDL model;
the Pesaran, Shin and Smith bound testing procedure (\(F_{ov}\) and \(t\)-test, when applicable);
the Sam, McNown and Goh bound testing procedure for \(F_{ind}\), when applicable;
the Johansen rank and trace cointegration tests on the independent variables.
Internally, the bootstrap data generation under the null is executed via
a Rcpp
function, employing the
Rcpp and
RcppArmadillo
packages (Eddelbuettel 2013), so as to greatly speed up computational times. As
explained in the previous section, cointegration tests in the
unconditional ARDL model are performed in order to uncover the presence
of spurious cointegrating relationships.
To this end, the function provides
the bootstrap critical values of the \(F_{ov}\), \(t\) and \(F_{ind}\)
tests in the conditional model, at level a.boot.H0
, along with the
same statistics computed in the conditional model.
a flag, called fakecoint
, that indicates divergence between the
outcomes of the \(F_{ind}\) test performed in both the conditional and
unconditional model. In this circumstance, as explained before,
there is no cointegration (see Bertelli et al. 2022).
A summary
method has been implemented to present the results in a
visually clear manner. It accepts the additional argument "out
" that
lets the user choose which output(s) to visualize: ARDL
prints the
conditional ARDL model summary, VECM
prints the VECM model summary,
cointARDL
prints the summary of the bound tests and the bootstrap
tests, cointVECM
prints the summary of the Johansen test on the
independent variables.
A detailed flowchart showing the function’s workflow is displayed in
Figure 4. There, the expressions "C ARDL"
and "UC ARDL" stand for conditional and unconditional ARDL model,
respectively.
In order to investigate the sensitivity of the procedure to different
sample sizes and number of bootstrap replicates, an experiment has been
run using a three-dimensional time series of length
\(T=\{50,80,100,200,500\}\), generating 100 datasets for each sample size
with the sim_vecm_ardl
function (Case II, with cointegrated variables,
and 2 lags in the short-run section of the model).
Then, the boot_ardl
function has been called
boot_ardl(data = df_sim,
nboot = bootr,
case = 2,
fix.ardl = rep(2, 3),
fix.vecm = 2)
In the code above, bootr
has been set equal to
\(B=\{200,500,1000,2000\}\), the number of lags has been assumed known
(fix.ardl
and fix.vecm
), while default values have been used for
every other argument (such as a.ardl
, a.vecm
and a.boot.H0
).
Table 1 shows the average running time per replication
together with the coefficient of variation (%) of the bootstrap critical
values of the \(F_{ov}\) test, for each value of \(T\) and \(B\), across 100
replications for each scenario.
Naturally, the running time increases as both sample size and bootstrap
replicates increase. However, it can be noticed how the coefficients of
variation tend to stabilize for \(B \geq 1000\), especially for \(T>80\), at
the 5% significance level. Therefore, it is recommended a number of
bootstrap replicates of at least \(B=1000\) for higher sample size, or at
least \(B=2000\) for smaller samples. The analysis has been carried out
using an Intel(R) Core(TM) i7-1165G7 CPU @ 2.80GHz processor, 16GB of
RAM.
\(T\) | \(B\) | Exec. Time (sec) | \(cv^{(F_{ov})}(5\%)\) | \(cv^{(F_{ov})}(2.5\%)\) | \(cv^{(F_{ov})}(1\%)\) |
---|---|---|---|---|---|
50 | 200 | 23.38 | 8.648 | 10.925 | 13.392 |
50 | 500 | 48.37 | 6.312 | 6.952 | 8.640 |
50 | 1000 | 96.65 | 4.806 | 5.613 | 6.288 |
50 | 2000 | 231.15 | 4.255 | 4.226 | 4.946 |
80 | 200 | 23.46 | 7.251 | 8.936 | 11.263 |
80 | 500 | 50.19 | 4.998 | 6.220 | 7.946 |
80 | 1000 | 143.00 | 3.882 | 4.453 | 5.305 |
80 | 2000 | 255.64 | 2.912 | 3.623 | 4.518 |
100 | 200 | 37.89 | 7.707 | 8.583 | 10.955 |
100 | 500 | 52.86 | 4.691 | 5.304 | 7.557 |
100 | 1000 | 184.51 | 3.512 | 4.567 | 5.695 |
100 | 2000 | 212.65 | 3.519 | 3.674 | 4.185 |
200 | 200 | 35.46 | 6.644 | 7.173 | 10.365 |
200 | 500 | 76.78 | 4.734 | 5.355 | 6.225 |
200 | 1000 | 148.25 | 3.124 | 4.177 | 5.034 |
200 | 2000 | 484.51 | 2.811 | 3.361 | 3.907 |
500 | 200 | 54.47 | 6.641 | 8.694 | 10.414 |
500 | 500 | 133.17 | 5.137 | 5.816 | 6.408 |
500 | 1000 | 271.87 | 3.905 | 4.585 | 5.283 |
500 | 2000 | 561.71 | 3.221 | 3.490 | 4.145 |
Table 1: Average execution times (in seconds) of the boot_ardl
function, for different combinations of sample size \(T\) and bootstrap
replicates \(B\). Coefficients of variation (\(cv\)) reported for the
\(F_{ov}\) bootstrap critical values at level 5%, 2.5% and 1%.
This section provides two illustrative application which highlight the performance of the bootstrap ARDL tests.
In the first example, the occurrence of a long-run relationship between
consumption [C], income [INC], and investment [INV] of Germany has
been investigated via a set of ARDL models, where each variable takes in
turn the role of dependent one, while the remaining are employed as
independent. The models have been estimated by employing the dataset of
Lütkepohl (2005) which includes quarterly data of the series over the
years 1960 to 1982. The data have been employed in logarithmic form.
Figure 5 displays these series over the sample
period.
Before applying the bootstrap procedure, the order of integration of
each series has been analyzed. Table 2 shows the results of
ADF test performed on both the series and their first-differences (\(k=3\)
maximum lags). The results confirm the applicability of the ARDL
framework as no series is integrated of order higher than one.
The following ARDL equations have been estimated:
First ARDL equation (C | INC, INV): \[\begin{align} \Delta \log \text{C}_{t}&=\alpha_{0.y} - a_{yy} \log \text{C}_{t-1} - {a}_{y.x_1}\log \text{INC}_{t-1} - {a}_{y.x_2}\log \text{INV}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{C}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{INC}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{INV}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{INC}_{t}+ \omega_2 \Delta\log \text{INV}_{t}+\nu_{t}. \end{align}\]
Second ARDL equation (INC | C, INV): \[\begin{align} \Delta \log \text{INC}_{t}&=\alpha_{0.y} - a_{yy} \log \text{INC}_{t-1} - {a}_{y.x_1}\log \text{C}_{t-1} - {a}_{y.x_2}\log \text{INV}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{INC}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{C}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{INV}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{C}_{t}+ \omega_2 \Delta\log \text{INV}_{t}+\nu_{t}. \end{align}\]
Third ARDL equation (INV | C, INC): \[\begin{align} \Delta \log \text{INV}_{t}&=\alpha_{0.y} - a_{yy} \log \text{INV}_{t-1} - {a}_{y.x_1}\log \text{C}_{t-1} - {a}_{y.x_2}\log \text{INC}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{INV}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{C}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{INC}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{C}_{t}+ \omega_2 \Delta\log \text{INC}_{t}+\nu_{t}. \end{align}\]
Table 3 shows the estimation results for each ARDL and VECM
model. It is worth noting that the instantaneous difference of the
independent variables are highly significant in each conditional ARDL
model. Thus, neglecting these variables in the ARDL equation, as happens
in the unconditional version of the model, may potentially lead to
biased estimates and incorrect inference. For the sake of completeness,
also the results of the marginal VECM estimation are reported for each
model.
The code to prepare the data, available in the package as the
ger_macro
dataset, is:
data("ger_macro")
= apply(ger_macro[,-1], 2, log)
LNDATA = paste0("LN", colnames(ger_macro)[-1])
col_ln = as.data.frame(LNDATA)
LNDATA colnames(LNDATA) = col_ln
Then, the boot_ardl
function is called, to perform the bootstrap
tests. In the code chunk below, Model I is considered.
set.seed(999)
= boot_ardl(data = LNDATA,
BCT_res_CONS yvar = "LNCONS",
xvar = c("LNINCOME", "LNINVEST"),
maxlag = 5,
a.ardl = 0.1,
a.vecm = 0.1,
nboot = 2000,
case = 3,
a.boot.H0 = c(0.05),
print = T)
to which follows the call to the summary
function
summary(BCT_res_CONS, out = "ARDL")
summary(BCT_res_CONS, out = "VECM")
summary(BCT_res_CONS, out = "cointVECM")
summary(BCT_res_CONS, out = "cointARDL")
The first summary line displays the output in the ARDL column of Table
3 and the second column of Table 4, Model
I. The second line corresponds to the VECM columns of Table
3, Model I - only for the independent variables. The
information on the rank of the \(\mathbf A_{xx}\) in Table 3
is inferred from the third line. Finally, the fourth summary line
corresponds to the test results in Table 4, Model I. A
textual indication of the presence of spurious cointegration is
displayed at the bottom of the "cointARDL"
summary, if detected.
In this example, the bootstrap and bound testing procedures are in
agreement only for model I, indicating the existence of a cointegrating
relationship. Additionally, no spurious cointegration is detected for
this model. As for models II and III, the null hypothesis is not
rejected by the bootstrap tests, while the PSS and SMG bound tests fail
to give a conclusive answer in the \(F_{ind}\) test.
The running time of the entire analysis is of roughly 11 minutes, using
an Intel(R) Core(TM) i7-1165G7 CPU @ 2.80GHz processor, 16GB of RAM.
level variable | first difference | ||||
---|---|---|---|---|---|
Series | lag | ADF | p.value | ADF | p-value |
\(\log\text{C}_t\) | 0 | -1.690 | 0.450 | -9.750 | <0.01 |
1 | -1.860 | 0.385 | -5.190 | <0.01 | |
2 | -1.420 | 0.549 | -3.130 | 0.030 | |
3 | -1.010 | 0.691 | -2.720 | 0.080 | |
\(\log\text{INC}_t\) | 0 | -2.290 | 0.217 | -11.140 | <0.01 |
1 | -1.960 | 0.345 | -7.510 | <0.01 | |
2 | -1.490 | 0.524 | -5.120 | <0.01 | |
3 | -1.310 | 0.587 | -3.290 | 0.020 | |
\(\log\text{INV}_t\) | 0 | -1.200 | 0.625 | -8.390 | <0.01 |
1 | -1.370 | 0.565 | -5.570 | <0.01 | |
2 | -1.360 | 0.570 | -3.300 | 0.020 | |
3 | -1.220 | 0.619 | -3.100 | 0.032 |
Model I | Model II | Model III | |||||||
---|---|---|---|---|---|---|---|---|---|
ARDL | VECM | ARDL | VECM | ARDL | VECM | ||||
\(\Delta\log\text{C}_t\) | \(\Delta\log\text{INV}_t\) | \(\Delta\log\text{INC}_t\) | \(\Delta\log\text{INC}_t\) | \(\Delta\log\text{C}_t\) | \(\Delta\log\text{INV}_t\) | \(\Delta\log\text{INV}_t\) | \(\Delta\log\text{C}_t\) | \(\Delta\log\text{INC}_t\) | |
\(\log\text{C}_{t-1}\) | -0.307 *** (0.055) | 0.168 * (0.081) | -0.0011 (0.0126) | 0.1286 * (0.0540) | 0.611 . (0.339) | -0.2727 *** (0.0704) | -0.0508 (0.0796) | ||
\(\log\text{INC}_{t-1}\) | 0.297 *** (0.055) | 0.124 * (0.054) | -0.017 (0.014) | -0.183 * (0.079) | -0.491 (0.340) | 0.2619 *** (0.0681) | 0.0464 (0.0772) | ||
\(\log\text{INV}_{t-1}\) | -0.001 (0.011) | -0.152 * (0.063) | 0.016 (0.017) | 0.0209 (0.0135) | -0.00107 (0.0142) | -0.1531 * (0.0607) | -0.1212 * (0.060) | ||
\(\Delta\log\text{C}_{t-1}\) | -0.248 ** (0.079) | 0.899 * (0.442) | 0.211 . (0.113) | 0.375 *** (0.1086) | 0.9288 * (0.442) | 1.113 * (0.441) | 0.2072 . (0.1142) | ||
\(\Delta\log\text{C}_{t-2}\) | 0.744 (0.431) | 0.8049 . (0.4345) | |||||||
\(\Delta\log\text{INC}_{t-1}\) | -0.1404 (0.1095) | ||||||||
\(\Delta\log\text{INC}_{t-2}\) | 0.2675 ** (0.0958) | 0.1522 . (0.0912) | |||||||
\(\Delta\log\text{INV}_{t-1}\) | -0.18 (0.111) | 0.035 (0.029) | -0.189 . (0.1097) | -0.175 (0.1075) | 0.0479 . (0.0282) | ||||
\(\Delta\log\text{INV}_{t-2}\) | 0.049 . (0.027) | 0.0591 * (0.0245) | 0.0578 * (0.0223) | 0.0562 * (0.0266) | |||||
\(\Delta\log\text{C}_t\) | 0.7070 *** (0.1093) | 1.8540 *** (0.5425) | |||||||
\(\Delta\log\text{INC}_t\) | 0.471 *** (0.074) | -0.445 *** (0.4726) | |||||||
\(\Delta\log\text{INV}_t\) | 0.065 ** (0.019) | -0.0230 (0.025) | |||||||
const. | 0.048 *** (0.013) | 0.036 (0.066) | 0.033 * (0.017) | 0.002 (0.018) | 0.0266 . (0.0155) | 0.023 (0.0666) | -0.056 (0.072) | 0.0517 ** (0.0157) | 0.0378 * (0.0177) |
J-test | \(rk(\mathbf{A_{xx}})=2\) | \(rk(\mathbf{A_{xx}})=2\) | \(rk(\mathbf{A_{xx}})=2\) |
PSS / SMG Threshold | Outcome | |||||||
---|---|---|---|---|---|---|---|---|
Model | Lags | Test | Boot. Critical Values | I(0) 5% | I(1) 5% | Statistic | Boot | Bound |
I | (1,0,0) | \(F_{ov}\) | 3.79 | 3.79 | 4.85 | 10.75 | Y | Y |
\(t\) | -2.88 | -2.86 | -3.53 | -5.608 | ||||
\(F_{ind}\) | 4.92 | 3.01 | 5.42 | 15.636 | ||||
II | (1,1,0) | \(F_{ov}\) | 5.79 | 3.79 | 4.85 | 2.867 | N | U |
\(t\) | -3.69 | -2.86 | -3.53 | -2.315 | ||||
\(F_{ind}\) | 7.38 | 3.01 | 5.42 | 3.308 | ||||
III | (1,1,0) | \(F_{ov}\) | 5.50 | 3.79 | 4.85 | 3.013 | N | U |
\(t\) | -3.32 | -2.86 | -3.53 | -2.020 | ||||
\(F_{ind}\) | 6.63 | 3.01 | 5.42 | 4.189 |
Following Bertelli et al. (2022), the relationship between foreign direct investment [FDI], exports [EXP], and gross domestic product [GDP] in Italy is investigated. The data of these three yearly variables have been retrieved from the World Bank Database and cover the period from 1970 to 2020. In the analysis, the log of the variables has been used and [EXP] and [FDI] have been adjusted using the GDP deflator. Figure 6 displays these series over the sample period.
Table 5 shows the outcomes of the ADF test performed on each variable, which ensures that the integration order is not higher than one for all variables. Table 6 shows the results of bound and bootstrap tests performed in ARDL model by taking each variable, in turn, as the dependent one. The following ARDL equations have been estimated:
First ARDL equation (GDP | EXP, FDI): \[\begin{align} \Delta \log \text{GDP}_{t}&=\alpha_{0.y} - a_{yy} \log \text{GDP}_{t-1} - {a}_{y.x_1}\log \text{EXP}_{t-1} - {a}_{y.x_2}\log \text{FDI}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{GDP}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{EXP}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{FDI}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{EXP}_{t}+ \omega_2 \Delta\log \text{FDI}_{t}+\nu_{t}. \end{align}\] For this model, a degenerate case of the first type can be observed, while the simpler bound testing procedure does not signal cointegration.
Second ARDL equation (EXP | GDP, FDI): \[\begin{align} \Delta \log \text{EXP}_{t}&=\alpha_{0.y} - a_{yy} \log \text{EXP}_{t-1} - {a}_{y.x_1}\log \text{GDP}_{t-1} - {a}_{y.x_2}\log \text{FDI}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{EXP}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{GDP}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{FDI}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{GDP}_{t}+ \omega_2 \Delta\log \text{FDI}_{t}+\nu_{t}. \end{align}\] For this model, the ARDL bootstrap test indicates absence of cointegration, while the bound testing approach is inconclusive for the \(F_{ind}\) test.
Third ARDL equation (FDI | GDP, EXP): \[\begin{align} \Delta \log \text{FDI}_{t}&=\alpha_{0.y} - a_{yy} \log \text{FDI}_{t-1} - {a}_{y.x_1}\log \text{GDP}_{t-1} - {a}_{y.x_2}\log \text{EXP}_{t-1} +\\\nonumber &\sum_{j=1}^{p-1}\gamma_{y.j} \Delta\log \text{FDI}_{t-j} + \sum_{j=1}^{s-1}\gamma_{x_1.j} \Delta\log \text{GDP}_{t-j} + \sum_{j=1}^{r-1}\gamma_{x_2.j} \Delta\log \text{EXP}_{t-j} +\\\nonumber &\omega_1 \Delta\log \text{GDP}_{t}+ \omega_2 \Delta\log \text{EXP}_{t}+\nu_{t}. \end{align}\] For this model, the long-run cointegrating relationship is confirmed using both boostrap and bound testing. No spurious cointegration is detected.
The code to load the data and perform the analysis (e.g. for Model I) is:
data("ita_macro")
= boot_ardl(data = ita_macro,
BCT_res_GDP yvar = "LGDP",
xvar = c("LEXP", "LFI"),
maxlag = 5,
a.ardl = 0.1,
a.vecm = 0.1,
nboot = 2000,
case = 3,
a.boot.H0 = c(0.05),
print = T)
For the sake of simplicity, the conditional ARDL and VECM marginal models outputs included in each cointegrating analysis is omitted. The summary for the cointegration tests for Model I is called via
summary(BCT_res_GDP, out = "ARDL") # extract lags
summary(BCT_res_GDP, out ="cointARDL") # ARDL cointegration
This empirical application further highlights the importance of dealing with inconclusive inference via the bootstrap procedure, while naturally including the effect of conditioning in the ARDL model, as highlighted in Bertelli et al. (2022).
No Drift, No Trend | Drift, No Trend | Drift and Trend | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Variable | Lag = 0 | Lag = 1 | Lag = 2 | Lag = 3 | Lag = 0 | Lag = 1 | Lag = 2 | Lag = 3 | Lag = 0 | Lag = 1 | Lag = 2 | Lag = 3 |
\(\log \text{GDP}_t\) | 0.99 | 0.974 | 0.941 | 0.796 | <0.01 | <0.01 | <0.01 | 0.084 | 0.99 | 0.99 | 0.99 | 0.99 |
\(\log \text{FDI}_t\) | 0.572 | 0.599 | 0.675 | 0.725 | <0.01 | 0.0759 | 0.3199 | 0.5174 | <0.01 | 0.013 | 0.151 | 0.46 |
\(\log \text{EXP}_t\) | 0.787 | 0.71 | 0.698 | 0.684 | 0.479 | 0.288 | 0.467 | 0.433 | 0.629 | 0.35 | 0.463 | 0.379 |
\(\Delta\log \text{GDP}_t\) | <0.01 | <0.0164 | 0.0429 | 0.0402 | <0.01 | 0.0861 | 0.3989 | 0.4267 | <0.01 | <0.01 | 0.0166 | 0.017 |
\(\Delta\log \text{FDI}_t\) | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 |
\(\Delta\log \text{EXP}_t\) | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | 0.0336 | 0.0315 |
PSS / SMG Threshold | Outcome | |||||||
---|---|---|---|---|---|---|---|---|
Model | Lags | Test | Boot. Critical Values | I(0) 5% | I(1) 5% | Statistic | Boot | Bound |
I | (1,1,0) | \(F_{ov}\) | 3.730 | 4.070 | 5.190 | 9.758 | D1 | N |
\(t\) | -2.020 | -2.860 | -3.530 | -2.338 | ||||
\(F_{ind}\) | 3.710 | 3.220 | 5.620 | 2.273 | ||||
II | (1,0,0) | \(F_{ov}\) | 5.400 | 4.070 | 5.190 | 2.649 | N | U |
\(t\) | -3.380 | -2.860 | -3.530 | -1.889 | ||||
\(F_{ind}\) | 5.630 | 3.220 | 5.620 | 3.481 | ||||
III | (1,0,0) | \(F_{ov}\) | 5.360 | 4.070 | 5.190 | 6.716 | Y | Y |
\(t\) | -3.550 | -2.860 | -3.530 | -4.202 | ||||
\(F_{ind}\) | 6.500 | 3.220 | 5.620 | 7.017 |
The bootCT package
allows the user to perform bootstrap cointegration tests in ARDL models
by overcoming the problem of inconclusive inference which is a
well-known drawback of standard bound tests. The package makes use of
different functions. The function boot_ardl
performs the bootstrap
tests, and it acts as a wrapper of both the bootstrap and the standard
bound tests, including also the Johansen test on the independent
variables of the model. Finally, it also performs the bound \(F\)-test on
the lagged independent variables, so far not available in other extant
R
packages. The function sim_vecm_ardl
, which allows the simulation
of multivariate time series data following a user-defined DGP, enriches
the available procedures for multivariate data generation, while the
function lag_mts
provides a supporting tool in building datasets of
lagged variables for any practical purpose. Finally, the use of Rcpp
functions gives a technical advantage in terms of computational speed,
performing the bootstrap analysis within an acceptable time frame.
Expanding the matrix polynomial \(\mathbf{A}(z)\) about \(z=1\), yields \[ \mathbf{A}(z)=\mathbf{A}(1)z+(1-z)\boldsymbol{\Gamma}(z), \tag{31}\] where \[\mathbf{A}(1)=\mathbf{I}_{K+1}-\sum_{j=1}^{p}\mathbf{A}_{j}\]
\[ \boldsymbol{\Gamma}(z)=\mathbf{I}_{K+1}-\sum_{i=1}^{p-1}\boldsymbol{\Gamma}_{i}z^i, \enspace \enspace \boldsymbol{\Gamma}_{i}=-\sum_{j=i+1}^{p}\mathbf{A}_j. \tag{32}\] The VECM model (2) follows accordingly, and
\[
\boldsymbol{\alpha}_0=\mathbf{A}(1)\boldsymbol{\mu}+(\boldsymbol{\Gamma}(1)-\mathbf{A}(1))\boldsymbol{\eta}, \enspace \enspace \enspace \boldsymbol{\alpha}_1=\mathbf{A}(1)\boldsymbol{\eta}. \tag{33}\]
Assuming that \(\mathbf{A}(1)\) is singular and that the variables
\(\mathbf{x}_{t}\) are cointegrated. This entails the following
\[\begin{align}
\mathbf{A}(1)=&\begin{bmatrix}
\underset{(1,1)}{a_{yy}} & \underset{(1,K)}{\mathbf{a}_{yx}'} \\ \underset{(K,1)}{\mathbf{a}_{xy}} & \underset{(K,K)}{\mathbf{A}_{xx}}
\end{bmatrix}=\underset{(K+1,r+1)}{\mathbf{B}}\underset{(r+1,K+1)}{\mathbf{C}'}=\begin{bmatrix}b_{yy} & \mathbf{b}_{yx}'\\ \mathbf{b}_{xy} & \mathbf{B}_{xx} \end{bmatrix}\begin{bmatrix}c_{yy} & \mathbf{c}_{yx}'\\ \mathbf{c}_{xy} & \mathbf{C}_{xx}'\end{bmatrix}= \nonumber\\
=&\begin{bmatrix}b_{yy}c_{yy}+\mathbf{b}_{yx}'\mathbf{c}_{xy} & b_{yy}\mathbf{c}_{yx}'+\mathbf{b}_{yx}'\mathbf{C}_{xx}'\\
\mathbf{b}_{xy}c_{yy}+\mathbf{B}_{xx}\mathbf{c}_{xy} & \mathbf{b}_{xy}\mathbf{c}_{yx}'+ \mathbf{A}_{xx} \end{bmatrix}, \enspace \enspace \enspace rk(\mathbf{A}(1))=rk(\mathbf{B})=rk(\mathbf{C}),\tag{34}
\end{align}\]
where \(\mathbf{B}\) and \(\mathbf{C}\) are full column rank matrices
arising from the rank-factorization of
\(\mathbf{A}(1)=\mathbf{B}\mathbf{C}'\) with \(\mathbf{C}\) matrix of the
long-run relationships of the process and \(\mathbf{B}_{xx}\),
\(\mathbf{C}_{xx}\) arising from the rank factorization of
\(\mathbf{A}_{xx}=\mathbf{B}_{xx}\mathbf{C}_{xx}'\), with
\(rk(\mathbf{A}_{xx})=rk(\mathbf{B}_{xx})=rk(\mathbf{C}_{xx})=r\) 6.
By partitioning the vectors \(\boldsymbol{\alpha}_{0}\),
\(\boldsymbol{\alpha}_{1}\), the matrix \(\mathbf{A}(1)\) and the polynomial
matrix \(\boldsymbol{\Gamma}(L)\) conformably to \(\mathbf{z}_{t}\), as
follows
\[ \boldsymbol{\alpha}_0=\begin{bmatrix} \underset{(1,1)}{\alpha_{0y}} \\ \underset{(K,1)}{\boldsymbol{\alpha}_{0x}} \end{bmatrix}, \enspace \enspace \enspace \boldsymbol{\alpha}_1=\begin{bmatrix} \underset{(1,1)}{\alpha_{1y}} \\ \underset{(K,1)}{\boldsymbol{\alpha}_{1x} } \end{bmatrix} \tag{35}\]
\[ \mathbf{A}(1)=\begin{bmatrix} \underset{(1,K+1)}{\mathbf{a}'_{(y)}} \\ \underset{(K,K+1)}{\mathbf{A}_{(x)}} \end{bmatrix} =\begin{bmatrix} \underset{(1,1)}{a_{yy}} & \underset{(1,K)}{\mathbf{a}'_{yx}} \\ \underset{(K,1)}{\mathbf{a}_{xy}} & \underset{(K,K)}{\mathbf{A}_{xx} } \end{bmatrix}, \enspace \enspace \enspace \boldsymbol{\Gamma}(L)=\begin{bmatrix} \underset{(1,K+1)}{\boldsymbol{\gamma}'_{y}(L)} \\ \underset{(K,K+1)}{\boldsymbol{\Gamma}_{(x)}(L)} \end{bmatrix} =\begin{bmatrix} \underset{(1,1)}{\gamma_{yy}(L)} & \underset{(1,K)}{\boldsymbol{\gamma}'_{yx}(L)} \\ \underset{(K,1)}{\boldsymbol{\gamma}_{xy}(L)} & \underset{(K,K)}{\boldsymbol{\Gamma}_{xx}(L) } \end{bmatrix} \tag{36}\] , and substituting (5) into (2) yields
\[ \Delta\mathbf{z}_t=\begin{bmatrix} \Delta y_{t} \\ \Delta\mathbf{x}_{t} \end{bmatrix}=\begin{bmatrix} \alpha_{0.y} \\ \boldsymbol{\alpha}_{0x} \end{bmatrix} + \begin{bmatrix} \alpha_{1.y} \\ \boldsymbol{\alpha}_{1x} \end{bmatrix}t- \begin{bmatrix} \mathbf{a}'_{(y).x} \\ \mathbf{A}_{(x)} \end{bmatrix}\begin{bmatrix} y_{t-1} \\ \mathbf{x}_{t-1} \end{bmatrix} + \begin{bmatrix} \boldsymbol{\gamma}'_{y.x}(L) \\ \boldsymbol{\Gamma}_{(x)}(L) \end{bmatrix}\Delta\mathbf{z}_t+\begin{bmatrix} \boldsymbol{\omega}'\Delta\mathbf{x}_{t} \\ \mathbf{0} \end{bmatrix}+\begin{bmatrix} {\nu}_{yt} \\ \boldsymbol{\varepsilon}_{xt} \end{bmatrix} \tag{37}\] , where
\[ \alpha_{0.y}=\alpha_{0y}-\boldsymbol{\omega}'\boldsymbol{\alpha}_{0x}, \enspace \enspace \enspace \alpha_{1.y}=\alpha_{1y}-\boldsymbol{\omega}'\boldsymbol{\alpha}_{1x} \tag{38}\]
\[ \mathbf{a}'_{(y).x}=\mathbf{a}'_{(y)}-\boldsymbol{\omega}'\mathbf{A}_{(x)}, \enspace \enspace \enspace \boldsymbol{\gamma}'_{y.x}(L)=\boldsymbol{\gamma}_{y}'(L)-\boldsymbol{\omega}'\boldsymbol{\Gamma}_{(x)}(L). \tag{39}\]
According to (37), the long-run relationships of the VECM turn out to be now included in the matrix
\[ \begin{bmatrix} \mathbf{a}'_{(y).x} \\ \mathbf{A}_{(x)} \end{bmatrix}=\begin{bmatrix} a_{yy}-\boldsymbol{\omega}'\mathbf{a}_{xy} & \mathbf{a}_{yx}'-\boldsymbol{\omega}'\mathbf{A}_{xx} \\ \mathbf{a}_{xy}&\mathbf{A}_{xx} \end{bmatrix}. \tag{40}\]
To rule out the presence of long-run relationships between \(y_{t}\) and \(\mathbf{x}_{t}\) in the marginal model, the \(\mathbf{x}_{t}\) variables are assumed to be exogenous with respect to the ARDL parameters, that is \(\mathbf{a}_{xy}\) is assumed to be a null vector. Accordingly, the long-run matrix in (40) becomes
\[ \widetilde{\mathbf{A}}=\begin{bmatrix}a_{yy} & \mathbf{a}'_{yx}-\boldsymbol{\omega}'\mathbf{A}_{xx} \\ \mathbf{0} & \mathbf{A}_{xx} \end{bmatrix}=\begin{bmatrix} a_{yy} & \widetilde{\mathbf{a}}_{y.x}' \\ \mathbf{0}&\mathbf{A}_{xx}\end{bmatrix} =\begin{bmatrix} b_{yy}c_{yy} & b_{yy}\mathbf c_{yx}'+(\mathbf{b}_{yx}'-\boldsymbol{\omega}'\mathbf{B}_{xx})\mathbf{C}_{xx}' \\ \mathbf{0}& \mathbf{B}_{xx}\mathbf{C}_{xx}'\end{bmatrix}. \tag{41}\]
After these algebraic transformations, the ARDL equation for
\(\Delta y_{t}\) can be rewritten as in (6).
In light of the factorization (34) of the matrix
\(\mathbf{A}(1)\), the long-run equilibrium vector \(\boldsymbol{\theta}\)
can be expressed as
\[ \boldsymbol{\theta}'= -\frac{1}{a_{yy}}\underset{(1,r+1)}{\left[b_{yy}\enspace\enspace(\mathbf{b}_{yx}-\boldsymbol{\omega}'\mathbf{B}_{xx})\right]} \underset{(r+1,K)}{\begin{bmatrix} \mathbf{c}'_{yx}\\ \mathbf{C}'_{xx} \end{bmatrix}}, \tag{42}\]
where
\(\widetilde{\mathbf{a}}_{y.x}=\mathbf{a}_{yx}-\boldsymbol{\omega}'\mathbf{A}_{xx}\).
Bearing in mind that \(\mathbf{C}'_{xx}\) is the cointegrating matrix for
the variables \(\mathbf{x}_t\), the equation (42) leads to the
following conclusion
\[
rk\begin{bmatrix}\mathbf{c}'_{yx}\\ \mathbf{C}'_{xx}\end{bmatrix}=\begin{cases}
r \to \enspace y_{t} \sim I(0) \\
r+1 \to \enspace y_{t} \sim I(1)
\end{cases}, \tag{43}\]
where \(r=rk(\mathbf{A}_{xx})\) and \(0 \leq r\leq K\).
Pesaran et al. (2001) introduced five different specifications for the ARDL
model, which depend on the deterministic components that can be absent
or restricted to the values they assume in the parent VAR model. In this
connection, note that, in light of (33), the drift and the
trend coefficient in the conditional VECM (37) are defined
as
\[\boldsymbol{\alpha_{0}}^{c}=\widetilde{\mathbf{A}}(1)(\boldsymbol{\mu}-\boldsymbol{\eta})+\widetilde{\boldsymbol{\Gamma}}(1)\boldsymbol{\eta} , \enspace \enspace
\boldsymbol{\alpha_{1}}^{c}=\widetilde{\mathbf{A}}(1)\boldsymbol{\eta},\]
where \(\widetilde{\mathbf{A}}(1)\) is as in (41) and
\(\widetilde{\boldsymbol{\Gamma}}(1)=\begin{bmatrix} \boldsymbol{\gamma}_{y.x}'(1) \\ \boldsymbol{\Gamma}_{(x)}(1) \end{bmatrix}\).
Accordingly, after partitioning the mean and the drift vectors as
\[\underset{(1,K+1)}{\boldsymbol{\mu}'}=[\underset{(1,1)}{\mu_{y}},\underset{(1,K)}{\boldsymbol{\mu}_x'}], \enspace \underset{(1,K+1)}{\boldsymbol{\eta}'}=[\underset{(1,1)}{\eta_{y}},\underset{(1,K)}{\boldsymbol{\eta}_{x}'}],\]
the intercept and the coefficient of the trend of the ARDL equation
(6) are defined as
\[\alpha_{0.y}^{EC}
= \mathbf{e}_{1}'\boldsymbol{\alpha_{0}}^{c}
=a_{yy}\mu_{y}-\widetilde{\mathbf{a}}'_{y.x}\boldsymbol{\mu}_{x}+\boldsymbol{\gamma}'_{y.x}(1)\boldsymbol{\eta}=a_{yy}(\mu_{y}-\boldsymbol{\theta}'\boldsymbol{\mu}_{x})+\boldsymbol{\gamma}'_{y.x}(1)\boldsymbol{\eta}, \enspace
\boldsymbol{\theta}'=-\frac{\widetilde{\mathbf{a}}'_{y.x}}{a_{yy}}\]
\[\enspace \enspace \alpha_{1.y}^{EC}=\mathbf{e}_{1}'\boldsymbol{\alpha_{1}}^{c}=
a_{yy}\eta_{y}-\widetilde{\mathbf{a}}'_{y.x}\boldsymbol{\eta}_{x}=a_{yy}(\eta_{y}-\boldsymbol{\theta'}\boldsymbol{\eta}_{x}),\]
where \(\mathbf{e}_{1}\) is the \(K+1\) first elementary vector.
In the error correction term
\[EC_{t-1}=y_{t-1}-\theta_{0}-\theta_{1}t-\boldsymbol{\theta}'\mathbf{x}_{t-1}\]
the parameters that partake in the calculation of intercept and trend
are
\[\theta_{0}=\mu_{y}-\boldsymbol{\theta}'\boldsymbol{\mu}_{x}, \enspace \theta_{1}=\eta_{y}-\boldsymbol{\theta}'\boldsymbol{\eta}_{x}.\]
In particular, these latter are not null only when they are assumed to
be restricted in the model specification.
The five specifications proposed by Pesaran et al. (2001) are
No intercept and no trend: \[\boldsymbol{\mu}=\boldsymbol{\eta}=\mathbf{0}.\] It follows that \[\theta_{0}=\theta_{1}=\alpha_{0.y}=\alpha_{1.y}=0.\] Accordingly, the model is as in (14).
Restricted intercept and no trend: \[\boldsymbol{\alpha}_{0}^{c}= \widetilde{\mathbf{A}}(1)\boldsymbol{\mu},\enspace \enspace \boldsymbol{\eta}=\mathbf{0},\] which entails \[\theta_0 \neq 0 \enspace\enspace\alpha_{0.y}^{EC}=a_{yy}\theta_{0}, \enspace \enspace \alpha_{0.y}=\theta_{1}=\alpha_{1.y}=0.\] Therefore, the intercept stems from the EC term of the ARDL equation. The model is specified as in (15)
Unrestricted intercept and no trend: \[\boldsymbol{\alpha}_{0}^{c}\neq\widetilde{\mathbf{A}}(1)\boldsymbol{\mu}, \enspace \enspace \boldsymbol{\eta}=\mathbf{0}.\] Thus, \[\alpha_{0.y}\neq 0,\enspace \enspace \theta_{0}=\theta_{1}=\alpha_{1.y}=0.\] Accordingly, the model is as in (16).
Unrestricted intercept, restricted trend: \[\boldsymbol{\alpha_{0}}^{c}\neq\widetilde{\mathbf{A}}(1)(\boldsymbol{\mu}-\boldsymbol{\eta})+\widetilde{\boldsymbol{\Gamma}}(1)\boldsymbol{\eta}\enspace \enspace {\boldsymbol{\alpha}}_{1}^{c}=\widetilde{\mathbf{A}}(1)\boldsymbol{\eta},\] which entails \[\alpha_{0.y} \neq 0,\enspace \enspace \theta_{0}=0 \enspace \enspace \theta_{1}\neq 0\enspace\enspace \alpha_{1.y}^{EC}=a_{yy}\theta_1\enspace\enspace \alpha_{1.y}=0.\] Accordingly, the trend stems from the EC term of the ARDL equation. The model is as in (17).
Unrestricted intercept, unrestricted trend: \[\boldsymbol{\alpha_{0}}^{c}\neq\widetilde{\mathbf{A}}(1)(\boldsymbol{\mu}-\boldsymbol{\eta})+\widetilde{\boldsymbol{\Gamma}}(1)\boldsymbol{\eta} \enspace \enspace {\boldsymbol{\alpha}}_{1}^{c}\neq\widetilde{\mathbf{A}}(1)\boldsymbol{\eta}.\] Accordingly, \[\alpha_{0.y} \neq 0 \enspace \enspace\alpha_{1.y} \neq 0, \enspace \enspace\theta_{0}=\theta_{1}=0.\] The model is as in (18).
bootCT, dynamac, magrittr, gtools, pracma, Rcpp, RcppArmadillo, Rmisc, ARDL, aod, vars, urca, aTSA, tseries, reshape2, ggplot2, stringr, tidyverse, dplyr, ggplot
ChemPhys, Databases, DifferentialEquations, Econometrics, Environmetrics, Finance, HighPerformanceComputing, MixedModels, ModelDeployment, NumericalMathematics, Phylogenetics, Spatial, TeachingStatistics, TimeSeries
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
The R
packages, either used in the creation of
bootCT or employed
in the analyses presented in this paper, are
magrittr
(Bache and Wickham 2022), gtools
(Bolker et al. 2022), pracma
(Borchers 2022), Rcpp
(Eddelbuettel 2013),
RcppArmadillo
(Eddelbuettel et al. 2023),
Rmisc (Hope 2022),
dynamac
(Jordan and Philips 2020), ARDL
(Natsiopoulos and Tzeremes 2021), aod
(Lesnoff et al. 2012), vars and
urca
(Pfaff 2008b; Pfaff 2008a),
aTSA (Qiu 2015),
tseries
(Trapletti and Hornik 2023),
reshape2,
ggplot2 and
stringr
(Wickham 2007, 2016, 2022),
tidyverse and
dplyr
(Wickham et al. 2019, 2023).↩︎
If the explanatory variables are stationary \(\mathbf{A}_{xx}\) is non-singular (\(rk(\mathbf{A}_{xx})=K\)), while when they are integrated but without cointegrating relationship \(\mathbf{A}_{xx}\) is a null matrix.↩︎
The knowledge of the rank of the cointegrating matrix is necessary to overcome this impasse.↩︎
The latter is introduced in the ARDL equation by the operation of conditioning \(y_t\) on the other variables \(\mathbf{x}_t\) of the model↩︎
In fact, as \(\boldsymbol{\omega}'\mathbf{A}_{xx}\mathbf{x}_{t} \approx I(0)\), the conclusion that \(y_{t}\approx I(0)\) must hold. This in turn entails that no cointegration occurs between \(y_t\) and \(\mathbf{x}_{t}\).↩︎
If the explanatory variables are stationary \(\mathbf{A}_{xx}\) is non-singular (\(rk(\mathbf{A}_{xx})=K\)), while when they are integrated but without cointegrating relationship \(\mathbf{A}_{xx}\) is a null matrix
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Vacca, et al., "bootCT: An R Package for Bootstrap Cointegration Tests in ARDL Models", The R Journal, 2025
BibTeX citation
@article{RJ-2024-003, author = {Vacca, Gianmarco and Zoia, Maria and Bertelli, Stefano}, title = {bootCT: An R Package for Bootstrap Cointegration Tests in ARDL Models}, journal = {The R Journal}, year = {2025}, note = {https://doi.org/10.32614/RJ-2024-003}, doi = {10.32614/RJ-2024-003}, volume = {16}, issue = {1}, issn = {2073-4859}, pages = {39-66} }