ciuupi: An R package for Computing Confidence Intervals that Utilize Uncertain Prior Information

Dr. Rheanna Mainzer; Dr. Paul Kabaila

1 Introduction

Suppose that \(\boldsymbol{y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}\) is a random \(n\)-vector of responses, \(\boldsymbol{X}\) is a known \(n \times p\) matrix with linearly independent columns, \(\boldsymbol{\beta}\) is an unknown parameter \(p\)-vector and \(\boldsymbol{\varepsilon} \sim N(\boldsymbol{0}, \, \sigma^2 \, \boldsymbol{I}_n)\), where \(\sigma^2\) is unknown. Let \(\boldsymbol{a}\) be a specified nonzero \(p\)-vector and suppose that the parameter of interest is \(\theta = \boldsymbol{a}^{\top} \boldsymbol{\beta}\). We refer to the model \(\boldsymbol{y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}\) as the “full” model and the confidence interval for \(\theta\) based on this model as the “standard” confidence interval. It is often the case that applied statisticians want to utilize uncertain prior information in their analysis. This uncertain prior information may arise from previous experience, scientific background or expert opinion. For example, a common assumption in a factorial experiment is that higher order interaction terms are equal to zero. We consider the uncertain prior information that \(\tau = \boldsymbol{c}^{\top} \boldsymbol{\beta} - t = 0\), where \(\boldsymbol{c}\) is a specified nonzero \(p\)-vector that is linearly independent of \(\boldsymbol{a}\) and \(t\) is a specified number. Our interest lies in computing (within a reasonable amount of time) a confidence interval for \(\theta\), with minimum coverage probability \(1 - \alpha\), that utilizes the uncertain prior information that \(\tau = 0\).

One could incorporate uncertain prior information in statistical inference using a Bayesian approach. In other words, a \(1 - \alpha\) credible interval for \(\theta\) could be constructed using an informative prior distribution for \(\tau\). However, the ciuupi package uses a frequentist approach to utilize the uncertain prior information that \(\tau = 0\). Utilizing uncertain prior information in frequentist inference has a distinguished history, which includes (Hodges and Lehmann 1952), (Pratt 1961), (Stein 1962), (Cohen 1972), (Bickel 1984), Kempthorne (1983) ((1983), (1987), (1988)), Casella and Hwang (1983) ((1983), (1987)), (Goutis and Casella 1991), (Tseng and Brown 1997), and (Efron 2006).

The standard confidence interval has the desired coverage probability throughout the parameter space. However, it does not utilize the uncertain prior information that \(\tau = 0\). One may attempt to utilize this uncertain prior information by carrying out a preliminary hypothesis test of the null hypothesis \(\tau = 0\) against the alternative hypothesis \(\tau \ne 0\). This attempt is based on the following two hopes. Firstly, if the prior information is correct then this test will lead to a confidence interval that is narrower than the standard confidence interval. Secondly, if this prior information happens to be incorrect then this test will effectively lead to the standard confidence interval. Unfortunately, this attempt fails miserably because, for certain values of \(\boldsymbol{a}\), \(\boldsymbol{c}\) and \(\boldsymbol{X}\), this post-model-selection confidence interval has minimum coverage probability far below \(1-\alpha\) (see e.g. Kabaila and Giri (2009b), (2009b)), making it unacceptable.

(Kabaila and Giri 2009a) proposed a family of confidence intervals, with minimum coverage probability \(1 - \alpha\), that utilize the uncertain prior information that \(\tau = 0\) as follows. This family of confidence intervals have expected length that is less than the expected length of the standard interval when the prior information is correct and maximum (over the parameter space) expected length that is not too much larger than the expected length of the standard confidence interval. In addition, these confidence intervals have the same expected length as the standard confidence interval when the data strongly contradict the prior information. The admissibility result of (Kabaila et al. 2010) implies that a confidence interval with the desired minimum coverage probability and expected length that is less than that of the standard confidence interval when the prior information is correct, must have an expected length that exceeds that of the standard interval for some parameter values.

Unfortunately, computing these confidence intervals is quite time consuming. Furthermore, there is no existing R package to compute these confidence intervals. Thus, if one wants to compute the confidence interval proposed by (Kabaila and Giri 2009a) and originally computed using MATLAB programs, they may have to write their own programs to do so. The time and skill required to write such programs present large barriers to the use of this confidence interval in applied statistics.

(KaMa2017?) ((KaMa2017?), Appendix A) described the family of confidence intervals proposed by (Kabaila and Giri 2009a) when \(\sigma^2\) is known. Each confidence interval in this family is specified by a different tradeoff between its performance when the prior information is correct and its performance when this prior information happens to be incorrect. (KaMa2017?) ((KaMa2017?)) then specified an attractive tradeoff that leads to a unique confidence interval. This interval and its coverage probability and expected length properties can now be easily and quickly computed using the R package ciuupi.

This confidence interval has the following three practical applications. Firstly, if \(\sigma^2\) has been accurately estimated from previous data, as in the factorial experiment example described later, then it may be treated as being effectively known. Secondly, for \(n - p\) sufficiently large (\(n - p \geq 30\), say), if we replace the assumed known value of \(\sigma^2\) by its usual estimator in the formula for the confidence interval then the resulting interval has, to a very good approximation, the same coverage probability and expected length properties as when \(\sigma^2\) is known. Thirdly, some more complicated models (including those considered by (KaMa2017?), (KaMa2017?)) can be approximated by the linear regression model with \(\sigma^2\) known when certain unknown parameters are replaced by estimates.

The only information needed to assess the coverage probability and expected length of the confidence interval that utilizes uncertain prior information (CIUUPI) are the values of \(\boldsymbol{a}\), \(\boldsymbol{c}\), \(\boldsymbol{X}\) and \(1 - \alpha\). We stress that this assessment does not use the observed response \(\boldsymbol{y}\). Indeed, if we want to choose between the CIUUPI and some other confidence interval, such as the standard confidence interval, then this choice must be made prior to any examination of the observed response \(\boldsymbol{y}\).

In this paper we provide motivating examples of scenarios where this confidence interval may be used. We then describe, in detail, the CIUUPI computed by the ciuupi package and the numerical constrained optimization method implemented to obtain it. We contrast and compare the CIUUPI with a \(1 - \alpha\) credible interval for \(\theta\) constructed using an informative prior distribution for \(\tau\). Lastly, instructions on how to use the functions in ciuupi are given, using a real data set, from a factorial experiment, as an illustrative example. We hope that, by making ciuupi freely available, statisticians who have uncertain prior information of the type that we specify and wish to utilize it will be encouraged to use the CIUUPI instead of the potentially misleading post-model-selection confidence interval.

2 Motivating examples

The following motivating examples are provided by (Kabaila and Giri 2013). These are examples of scenarios where the ciuupi package may be used to find a confidence interval for the parameter of interest \(\theta\) that utilizes the uncertain prior information that \(\tau = 0\).

Pooling of normal means. Suppose that \(y_i = \beta_1 + \varepsilon_i\) for \(i = 1, \dots, n_1\) and \(y_i = \beta_1 + \beta_2 + \varepsilon_i\) for \(i = n_1 + 1, \dots, n_1 + n_2\), where the \(\varepsilon_i\)’s are independent and identically distributed (i.i.d.) \(N(0, \, \sigma^2)\). The parameter of interest is \(\theta = \beta_1\) and we have uncertain prior information that \(\tau = \beta_2 = 0\).
One-way analysis of variance for two treatments. Suppose that \(y_{ij} = \beta_i + \varepsilon_{ij}\) for \(i = 1, 2\) and \(j = 1, \dots, n_i\), where the \(\varepsilon_i\)’s are i.i.d. \(N(0, \, \sigma^2)\). The parameter of interest is \(\theta = \beta_1\) and we have uncertain prior information that \(\tau = \beta_1 - \beta_2 = 0\).
A \(2^k\) factorial experiment with two or more replicates. The parameter of interest \(\theta\) is a specified contrast. For factorial experiments it is commonly believed that higher order interactions are negligible. Suppose that the uncertain prior information is that the highest order interaction is zero.
One-way analysis of covariance with two treatments and normal errors. The parameter of interest \(\theta\) is the difference in expected responses for the two treatments, for a specified value of the covariate. The uncertain prior information is that the hypothesis of ‘parallellism’ is satisfied.
Polynomial regression. Suppose that \(y_i = \beta_1 + \beta_2 \, x_i + \dots + \beta_p \, x_i^{p-1} + \varepsilon_i\) for \(i = 1, \dots, n\), where the \(\varepsilon_i\)’s are i.i.d. \(N(0, \, \sigma^2)\). The parameter of interest \(\theta\) is the expected response for a specified value of the explanatory variable \(x\). The uncertain prior information is that \(\tau = \beta_p = 0\).
Linear regression with at least one interaction term. The parameter of interest \(\theta\) is a given linear combination of the regression parameters. The uncertain prior information is that a specified interaction term is 0.

In addition to the above examples, (KaMa2017?) have used the ciuupi package to aid in the computation of a confidence interval that utilizes uncertain prior information in the following more complicated scenario that arises in the analysis of both clustered and longitudinal data. Suppose that \(y_{ij} = \beta_0 + \beta_1 \, x_{ij} + \beta_2 \, \overline{x}_i + \eta_i + \varepsilon_{it}\) for \(i = 1, \dots, N\) and \(j = 1, \dots, J\), where \(\overline{x}_i = J^{-1} \sum_{j=1}^J x_{ij}\), the \(\eta_i\)’s are i.i.d. \(N(0, \, \sigma^2_{\eta})\), and the \(\varepsilon_{ij}\)’s are i.i.d. \(N(0, \, \sigma^2_{\varepsilon})\). The parameter of interest is \(\theta = \beta_1\) and we have uncertain prior information that \(\tau = \beta_2 = 0\).

3 The confidence interval that utilizes uncertain prior information computed by ciuupi

Let \(\widehat{\boldsymbol{\beta}} = (\boldsymbol{X}^{\top} \boldsymbol{X})^{-1} \, \boldsymbol{X}^{\top} \, \boldsymbol{y}\), the least squares estimator of \(\boldsymbol{\beta}\). Then \(\widehat{\theta} = \boldsymbol{a}^{\top} \widehat{\boldsymbol{\beta}}\) and \(\widehat{\tau} = \boldsymbol{c}^{\top} \widehat{\boldsymbol{\beta}} - t\) are the least squares estimators of \(\theta\) and \(\tau\), respectively. Now let \(v_{\theta} = \text{Var}(\widehat{\theta})/\sigma^2 = \boldsymbol{a}^{\top}(\boldsymbol{X}^{\top}\boldsymbol{X})^{-1}\boldsymbol{a}\) and \(v_{\tau} = \text{Var}(\widehat{\tau}) / \sigma^2 = \boldsymbol{c}^{\top}(\boldsymbol{X}^{\top} \boldsymbol{X})^{-1} \boldsymbol{c}\). The known correlation between \(\widehat{\theta}\) and \(\widehat{\tau}\) is \(\rho = \boldsymbol{a}^{\top}(\boldsymbol{X}^{\top}\boldsymbol{X})^{-1}\boldsymbol{c}/(v_{\theta} \, v_{\tau})^{1/2}\). Let \(\gamma = \tau / (\sigma \, v_{\tau}^{1/2})\), a scaled version of \(\tau\), and \(\widehat{\gamma} = \widehat{\tau}/(\sigma \, v_{\tau}^{1/2})\), an estimator of \(\gamma\). Assume that \(\sigma^2\) is known.

The confidence interval that utilizes uncertain prior information about \(\tau\) has the form \[\label{eqn_ciuupi} CI(b, s) = \left[ \widehat{\theta} - v_{\theta}^{1/2} \, \sigma \, b \left( \widehat{\gamma} \right) - v_{\theta}^{1/2} \sigma \, s \left( \widehat{\gamma} \right), \, \, \widehat{\theta} - v_{\theta}^{1/2} \, \sigma \, b \left( \widehat{\gamma} \right) + v_{\theta}^{1/2} \sigma \, s \left( \widehat{\gamma} \right) \right], \tag{1}\] where \(b: \mathbb{R} \to \mathbb{R}\) is an odd continuous function and \(s: \mathbb{R} \to \mathbb{R}\) is an even continuous fuction. In addition, \(b(x) = 0\) and \(s(x) = z_{1-\alpha / 2}\) for all \(|x| \geq 6\), where the quantile \(z_a\) is defined by \(P(Z \le z_a) = a\) for \(Z \sim N(0,1)\). The functions \(b\) and \(s\) are fully specified by the vector \((b(1), b(2), \dots, b(5), s(0), s(1), \dots, s(5))\) as follows. By assumption, \(b(0)=0\), \((b(-1), b(-2), \dots, b(-5)) \newline = (-b(1), -b(2), \dots, -b(5))\) and \((s(-1), \dots, s(-5)) = (s(1), \dots, s(5))\). The values of \(b(x)\) and \(s(x)\) for any \(x \in [-6, 6]\) are found by cubic spline interpolation for the given values of \(b(i)\) and \(s(i)\) for \(i = -6, -5, \dots, 0, 1, \dots, 5, 6\). The functions \(b\) and \(s\) are computed such that \(CI(b, s)\) has minimum coverage probability \(1 - \alpha\) and the desired expected length properties. This numerical computation method is described in detail in the next section. Note that the functions \(b\) and \(s\) are computed assuming that \(\sigma^2\) is known.

As stated in the introduction, for \(n - p\) sufficiently large (\(n - p \geq 30\), say), if we replace the assumed known value of \(\sigma^2\) by \(\widehat{\sigma}^2 = (\boldsymbol{y} - \boldsymbol{X} \widehat{\boldsymbol{\beta}})^{\top}(\boldsymbol{y} - \boldsymbol{X} \widehat{\boldsymbol{\beta}})/ (n - p)\) in the formula for \(CI(b, s)\) then the resulting interval has, to a very good approximation, the same coverage probability and expected length properties as when \(\sigma^2\) is known. In ciuupi, if no value of \(\sigma^2\) is supplied then the user is given the option of replacing \(\sigma^2\) by \(\widehat{\sigma}^2\), with a warning that \(n - p\) needs to be sufficiently large (\(n - p \geq 30\), say).

Numerical constrained optimization method used to compute the vector \((b(1), b(2), \dots, b(5),s(0), s(1), \dots, s(5))\)

Let \[\begin{aligned} k(x) = \Psi \left( b(x) - s(x), \, b(x) + s(x); \, \rho\left(x - \gamma\right), \, 1 - \rho^2 \right) \end{aligned}\] and \[\begin{aligned} k^{\dagger}(x) = \Psi \left( -z_{1-\alpha/2}, \, z_{1-\alpha/2}; \, \rho \left(x - \gamma\right), \, 1 - \rho^2 \right), \end{aligned}\] where \(\Psi(\ell, u \, ; \, \mu, \, \sigma^2) = P \left(\ell \leq Z \leq u\right)\) for \(Z \sim N(\mu, \, \sigma^2)\). A computationally convenient expression for the coverage probability of \(CI(b, s)\) is \[\label{eqn_CP} CP(\gamma; b, s, \rho) = 1 - \alpha + \int_0^6 \left( k(x) - k^{\dagger}(x) \right) \phi(x - \gamma) + \left( k(-x) - k^{\dagger}(-x) \right) \phi(x + \gamma) \, dx, \tag{2}\] where \(\phi\) denotes the \(N(0,1)\) pdf. This coverage probability depends on the unknown parameter \(\gamma\), the functions \(b\) and \(s\), the known correlation \(\rho\) and the desired minimum coverage probability \(1 - \alpha\). (Giri 2008) has shown that \(CP(\gamma; b, s, \rho)\) is an even function of \(\gamma\).

Define the scaled expected length of \(CI(b, s)\) to be the expected length of \(CI(b, s)\) divided by the expected length of the standard \(1 - \alpha\) confidence interval, given by \[\label{standard_interval} \left[ \widehat{\theta} - z_{1-\alpha/2} \, v_{\theta}^{1/2} \, \sigma, \, \widehat{\theta} + z_{1-\alpha/2} \, v_{\theta}^{1/2} \, \sigma \right]. \tag{3}\] This scaled expected length of \(CI(b, s)\) is given by \[SEL(\gamma; s, \rho) = 1 + \dfrac{1}{z_{1-\alpha/2}} \int_{-6}^6 \left( s(x) - z_{1-\alpha/2} \right) \phi \left( x - \gamma \right) dx.\] This scaled expected length depends on the unknown parameter \(\gamma\), the function \(s\), the known correlation \(\rho\) and the desired minimum coverage probability \(1 - \alpha\). (Giri 2008) has shown that \(SEL(\gamma; s, \rho)\) is an even function of \(\gamma\).

We compute the functions \(b\) and \(s\) such that \(CI(b, s)\) has minimum coverage probability \(1 - \alpha\) and the desired expected length properties as follows. For given \(\lambda \in [0, \infty)\), we minimize the objective function \[\label{obj_fun} \left(SEL \left(\gamma = 0; s, \rho \right) - 1 \right) + \lambda \int_{-\infty}^{\infty} \left(SEL(\gamma; s, \rho) - 1 \right) \, d\gamma, \tag{4}\] with respect to the vector \(\big(b(1), b(2), \dots, b(5), s(0), s(1), \dots, s(5) \big)\), subject to the coverage constraint \(CP(\gamma) \geq 1 - \alpha\) for all \(\gamma\). Equivalently, minimize the objective function \[\label{eqn:equiv_objfun} \xi \left( SEL(\gamma = 0; s, \rho) - 1 \right) + (1 - \xi) \int_{-\infty}^{\infty} \left( SEL(\gamma; s, \rho) - 1 \right) d\gamma \tag{5}\] subject to this constraint, where \(\xi = 1 / (1 + \lambda)\). A computationally convenient formula for the objective function (4) is \[\label{obj_fun2} \dfrac{2}{z_{1-\alpha/2}} \int_0^6 \left(s(h) - z_{1-\alpha/2} \right) \left(\lambda + \phi(h) \right) dh. \tag{6}\] Since we are minimizing this objective function, we can leave out the constant at the front of the integral.

When \(\lambda\) is large, this numerical computation recovers the standard confidence interval (3) for \(\theta\). As \(\lambda\) decreases towards 0, this computation puts increasing weight on achieving a small value of \(SEL(\gamma = 0; s, \rho)\), i.e. an improved confidence interval performance when the uncertain prior information that \(\tau = 0\) is correct. However, as \(\lambda\) decreases, \(\max_{\gamma} SEL(\gamma; s, \rho)\) increases, i.e. the performance of the confidence interval when the prior information happens to be incorrect is degraded. Following (KaMa2017?), we choose \(\lambda\) such that the “gain” when the prior information is correct, as measured by \[\label{SEL_gain} 1 - \left(SEL\left(\gamma = 0; s, \rho \right) \right)^2, \tag{7}\] is equal to the maximum possible “loss” when the prior information happens to be incorrect, as measured by \[\label{SEL_loss} \left(\max_{\gamma} SEL \left(\gamma; s, \rho \right) \right)^2 - 1. \tag{8}\] We denote this value of \(\lambda\) by \(\lambda^*\). Our computational implementation of the constraint \(CP(\gamma) \geq 1 - \alpha\) for all \(\gamma\) is to require that \(CP(\gamma) \geq 1 - \alpha\) for all \(\gamma \in \{0, 0.05, 0.1, \dots, 8\}\). By specifying constraints on the coverage probability \(CP(\gamma)\) for such a fine grid of nonnegative values of \(\gamma\), we ensure that, to an exceedingly good approximation, \(CP(\gamma; b, s, \rho) \geq 1 - \alpha\) for all values of \(\gamma\).

In summary, we compute the vector \(\big(b(1), b(2), \dots, b(5), s(0), s(1), \dots, s(5) \big)\) by minimizing (6), where \(\lambda\) is chosen such that (7) = (8), subject to the constraints \(CP(\gamma; b, s, \rho) \geq 1 - \alpha\) for all \(\gamma \in \{0, 0.05, 0.1, \dots, 8\}\). Once \((b(1), b(2), \dots, b(5), s(0), s(1), \dots, s(5))\) has been computed in this way, we can easily compute the confidence interval that utilizes the uncertain prior information (CIUUPI) for observed response \(\boldsymbol{y}\).

This constrained optimization procedure is carried out using the slsqp function in the nloptr package (see Johnson (2014), (2014)). Perhaps surprisingly, the large number of constraints on the coverage probability \(CP(\gamma; b, s, \rho)\) is handled well by the slsqp function. The integrals in (2) and (6) are computed as follows. For greater accuracy, each integral is split into a sum of six integrals, with lower and upper endpoints consisting of successive knots. Each of these integrals is then computed using Gauss Legendre quadrature with five nodes. Gauss Legendre quadrature was found to be both faster and more accurate than the R function integrate. This quadrature is carried out using the gauss.quad function in the statmod package (see Smyth (2005), (2005)).

4 A comparison of the CIUUPI with a Bayesian interval estimator

(Kabaila and Dharmarathne 2015) compare Bayesian and frequentist interval estimators for \(\theta\) in the linear regression context considered in this paper when \(\sigma^2\) is unknown. They find that the Bayesian and frequentist interval estimators differ substantially. In this section we compare a \(1 - \alpha\) credible interval for \(\theta\) with the CIUUPI, assuming that \(\sigma^2\) is known.

For ease of comparison of the CIUUPI with a credible interval, we re-express the regression sampling model as follows. Let the \(n \times p\) matrix \(\boldsymbol{\widetilde{X}}\) be obtained from \(\boldsymbol{X}\) using the transformation described in Appendix B of (Kabaila and Dharmarathne 2015). The attractive property of \(\boldsymbol{\widetilde{X}}\) is that \[(\boldsymbol{\widetilde{X}}^{\top} \boldsymbol{\widetilde{X}})^{-1} = \begin{pmatrix} \boldsymbol{V} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{I}_{p-2} \end{pmatrix}, \text{ where } \boldsymbol{V} = \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix}.\] We re-express the regression sampling model as \(\boldsymbol{\widetilde{y}} = \boldsymbol{\widetilde{X}} \begin{bmatrix} \vartheta, \gamma, \boldsymbol{\chi}^{\top} \end{bmatrix}^{\top} + \boldsymbol{\widetilde{\varepsilon}}\), where \(\boldsymbol{\widetilde{y}} = \boldsymbol{y}/\sigma\), \(\boldsymbol{\widetilde{\varepsilon}} = \boldsymbol{\varepsilon}/\sigma\) and \(\vartheta = \theta / (\sigma \, v_{\theta}^{1/2})\). Obviously, \(\boldsymbol{\widetilde{\varepsilon}} \sim N(\boldsymbol{0}, \boldsymbol{I}_n)\). Let \((\widehat{\vartheta}, \widehat{\gamma}, \boldsymbol{\widehat{\chi}})\) denote the least squares estimator of \((\vartheta, \gamma, \boldsymbol{\chi})\). Note that \(\widehat{\vartheta} = \widehat{\theta} / (\sigma \, v_{\theta}^{1/2})\). Clearly, \((\widehat{\vartheta}, \widehat{\gamma})\) has a bivariate normal distribution with mean \((\vartheta, \gamma)\) and covariance matrix \(\boldsymbol{V}\) and, independently, \(\boldsymbol{\widehat{\chi}} \sim N(\boldsymbol{\chi}, \boldsymbol{I}_{p-2})\). Dividing the endpoints of the CIUUPI by \(\sigma \, v_{\theta}^{1/2}\), we obtain the following confidence interval for \(\vartheta\): \[\label{eqn:CI_vartheta} \left[ \widehat{\vartheta} - b(\widehat{\gamma}) - s(\widehat{\gamma}), \, \widehat{\vartheta} - b(\widehat{\gamma}) + s(\widehat{\gamma}) \right], \tag{9}\] where the functions \(b\) and \(s\) have been obtained using the constrained optimization described in the previous section.

The uncertain prior information that \(\tau = 0\) implies the uncertain prior information that \(\gamma = 0\). The properties of a Bayesian \(1 - \alpha\) credible interval depend greatly on the prior distribution chosen for \((\vartheta, \gamma, \boldsymbol{\chi})\). We have chosen a prior distribution that leads to a credible interval with some similarities to the CIUUPI. Assume that the prior probability density function of \((\vartheta, \gamma, \boldsymbol{\chi})\) is proportional to \(\xi^* \, \delta(\tau) + (1 - \xi^*)\), where \(\xi^* = 1 / (1 + \lambda^*)\) and \(\delta\) denotes the Dirac delta function. In other words, we assume an improper prior density for \(\tau\) that consists of a mixture of an infinite rectangular unit-height ‘slab’ and a Dirac delta function ‘spike’, combined with noninformative prior densities for the other parameters. This prior density is a Bayesian analogue of the weight function used in the weighted average over \(\gamma\), (5). It may be shown that the marginal posterior density of \(\vartheta\) is \[\label{eqn:mar_post} w(\hat{\gamma}) \, \phi\left(\vartheta; \, \widehat{\vartheta} - \rho \widehat{\gamma}, 1 - \rho^2\right) + \left( 1 - w\left(\hat{\gamma} \right) \right) \phi\left(\vartheta; \, \widehat{\vartheta}, 1\right), \tag{10}\] where \(w(\hat{\gamma}) = 1 \big/ \big(1 + \lambda^* \sqrt{2 \pi} \exp( \widehat{\gamma}^2 / 2)\big)\) and \(\phi(\, \cdot \, ; \mu, \nu)\) denotes the \(N(\mu, \nu)\) pdf. We note that this posterior density is a mixture of two normal probability density functions, such that the weight given to the posterior density centred at \(\widehat{\vartheta}\) increases with increasing \(\widehat{\gamma}^2\), when \(\lambda^* > 0\). It is evident from (10) that the highest posterior density Bayesian \(1 - \alpha\) credible interval may consist of the union of two disjoint intervals. For this reason, we consider the shortest \(1 - \alpha\) credible interval.

Note that the graph of the function (10) of \(\vartheta\) consists of the graph of the function \[w(\widehat{\gamma}) \, \phi \left(\vartheta; - \rho \widehat{\gamma}, 1 - \rho^2 \right) + \left( 1 - w(\widehat{\gamma}) \right) \phi \left(\vartheta; 0, 1 \right),\] shifted to the right by \(\widehat{\vartheta}\). We can therefore express the shortest \(1 - \alpha\) credible interval for \(\vartheta\) in the form \([\widehat{\vartheta} + l(\widehat{\gamma}), \widehat{\vartheta} + u(\widehat{\gamma})]\), for the appropriate functions \(l\) and \(u\). We compare this interval with the frequentist \(1 - \alpha\) confidence interval (9) as follows. Let \(b_B(\widehat{\gamma}) = -(l(\widehat{\gamma}) + u(\widehat{\gamma}))/2\) and \(s_B(\widehat{\gamma}) = (u(\widehat{\gamma}) - l(\widehat{\gamma}))/2\). Then \([\widehat{\vartheta} + l(\widehat{\gamma}), \widehat{\vartheta} + u(\widehat{\gamma})]\) is equal to \[\label{eqn:BCI_vartheta} \left[ \widehat{\vartheta} - b_B(\widehat{\gamma}) - s_B(\widehat{\gamma}), \, \widehat{\vartheta} - b_B(\widehat{\gamma}) + s_B(\widehat{\gamma}) \right], \tag{11}\] which has a similar form to (9), but with \(b\) and \(s\) replaced by \(b_B\) and \(s_B\) respectively. Therefore, we may compare the interval (9) with (11) by comparing the functions \(b\) and \(s\) with the functions \(b_B\) and \(s_B\), respectively. We will also compare the interval (9) with (11) by comparing the frequentist coverage probability function of (11).

5 Using the ciuupi package

In this section we use a real data set to illustrate how each of the six functions in ciuupi works. Table 1 below gives the name of each of the functions and a short description of what it does. In the following subsections we show how the functions in Table 1 are used in R.

Table 1: Functions in the *ciuupi* package
Function	Description
`bsciuupi`	Compute the optimized vector \(\big(b(1), b(2), \dots, b(5), s(0), s(1), \dots, s(5)\big)\)
`bsspline`	Evaluate \(b(x)\) and \(s(x)\) at \(x\)
`cpci`	Evaluate \(CP(\gamma; b, s, \rho)\) at \(\gamma\)
`selci`	Evaluate \(SEL(\gamma; b, s, \rho)\) at \(\gamma\)
`ciuupi`	Compute the CIUUPI, i.e. compute \(CI(b, s)\)
`cistandard`	Compute the standard confidence interval

Factorial experiment example

Consider the \(2 \times 2\) factorial experiment described by Kabaila and Giri (2009a) (Discussion 5.8, (2009a)), which has been extracted from a real \(2^3\) factorial data set provided by (Box et al. 1963). The two factors are the time of addition of HNO\(_3\) and the presence of a ‘heel’. These factors are labelled A and B, respectively. Define \(x_1 = -1\) and \(x_1 = 1\) for “Time of addition of HNO\(_3\)” equal to 2 hours and 7 hours, respectively. Also define \(x_2 = -1\) and \(x_2 = 1\) for “heel absent” and “heel present”, respectively. Assume a model of the form \[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1 x_2+ \varepsilon,\] where \(\varepsilon \sim N(0, \sigma^2 )\). This model can be written in matrix form as \[\boldsymbol{y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}\] where \(\boldsymbol{\beta} = \big(\beta_0, \, \beta_1, \, \beta_2, \, \beta_{12}\big)\), \[\boldsymbol{X} = \begin{pmatrix} 1 & -1 & -1 & 1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & 1 & -1 \\1 & 1 & 1 & 1 \end{pmatrix}\] and \(\boldsymbol{\varepsilon} \sim N(0, \, \sigma^2 \boldsymbol{I}_n)\). According to (Box et al. 1963), a very accurate estimate of \(\sigma\), obtained from previous related experiments, is 0.8.

Suppose that the parameter of interest \(\theta\) is \(\big(\)expected response when factor A is high and factor B is low\(\big)-\big(\)expected response when factor A is low and factor B is low\(\big)\). In other words, \(\theta = 2(\beta_1 - \beta_{12})\), so that \(\theta = \boldsymbol{a}^{\top} \boldsymbol{\beta}\), where \(\boldsymbol{a} = (0, 2, 0, -2)\). Our aim is to find a confidence interval, with minimum coverage 0.95, for \(\theta\). We suppose that there is uncertain prior information that the two-factor interaction is zero. In other words, we suppose that there is uncertain prior information that \(\beta_{12} = 0\). The uncertain prior information is, then, that \(\tau = \boldsymbol{c}^{\top} \boldsymbol{\beta} - t = 0\), where \(\boldsymbol{c} = (0, 0, 0, 1)\) and \(t = 0\). Now that we have specified \(\boldsymbol{a}\), \(\boldsymbol{c}\) and \(\boldsymbol{X}\), we can compute \(\rho = \boldsymbol{a}^{\top}(\boldsymbol{X}^{\top}\boldsymbol{X})^{-1}\boldsymbol{c}/(v_{\theta} \, v_{\tau})^{1/2} = -1 / \sqrt{2} = -0.707\).

Evaluating the confidence interval (no examination of the observed response)

First suppose that we have not yet examined the observed response \(\boldsymbol{y}\) and that we are interested in knowing how the confidence interval that utilizes uncertain prior information (CIUUPI) performs for given values of \(1 - \alpha\), \(\boldsymbol{a}\), \(\boldsymbol{c}\) and \(\boldsymbol{X}\). We begin by storing the values of \(\alpha\), \(\boldsymbol{a}\), \(\boldsymbol{c}\) and \(\boldsymbol{X}\) in R as follows.

# Specify alpha, a, c and x.
alpha <- 0.05
a <- c(0, 2, 0, -2)
c <- c(0, 0, 0, 1)
x <- cbind(rep(1, 4), c(-1, 1, -1, 1), c(-1, -1, 1, 1), c(1, -1, -1, 1))

Next we use the numerical constrained optimization to compute the values at the knots of the functions \(b\) and \(s\) that define the CIUUPI. We must specify whether natural cubic spline interpolation (natural = 1) or clamped cubic spline interpolation (natural = 0) is used in the description of these functions. In the case of clamped cubic spline interpolation the first derivatives of \(b\) and \(s\) are set to zero at \(-6\) and \(6\). Natural cubic spline interpolation is the default, and is carried out using splinefun in the stats package. Clamped cubic spline interpolation is carried out using cubicspline in the pracma package. The nonlinear constrained optimization using natural cubic spline interpolation for the description of the functions \(b\) and \(s\) is much faster and results in a coverage probability that is slightly closer to \(1 - \alpha\) throughout the parameter space. For this example we are able to obtain the vector \((b(1), b(2), \dots, b(5), s(0), s(1), \dots, s(5))\) in 6.56 minutes when using natural cubic spline interpolation and in 21.27 minutes when using clamped cubic spline interpolation. This computation was carried out on a PC with an Intel i7-7500 CPU (3.4GHz) and 32GB of RAM. The following code is used to obtain the vector \(\big(b(1), b(2), \dots, b(5), s(0), s(1), \dots, s(5) \big)\) that specifies the CIUUPI, which is obtained from the numerical constrained optimization that uses natural cubic spline interpolation for the description of the functions \(b\) and \(s\).

# Compute (b(1), b(2), ..., b(5), s(0), s(1), ..., s(5)) that specifies the CIUUPI
bsvec <- bsciuupi(alpha, a = a, c = c, x = x)
bsvec

Alternatively, since we know that \(\rho = -0.707\), we could obtain the vector \(\big(b(1), b(2), \dots, b(5), s(0), \newline s(1), \dots, s(5)\big)\) that specifies the CIUUPI using the code

# Compute (b(1), b(2), ..., b(5), s(0), s(1), ..., s(5)) that specifies the CIUUPI,
# given rho
bsvec2 <- bsciuupi(alpha, rho = -0.707)

Now that we have the vector \(\big(b(1), b(2), \dots, b(5), s(0), s(1), \dots, s(5) \big)\) that specifies the CIUUPI, we can graph the functions \(b\) and \(s\) using the following code:

# Compute the functions b and s that specify the CIUUPI on a grid of values
splineval <- bsspline(seq(0, 8, by = 0.1), bsvec, alpha)

# The first 5 values of bsvect are b(1),b(2),...,b(5).
# The last 6 values are s(0),s(1),...,s(5).
xseq <- seq(0, 6, by = 1)
bvec <- c(0, bsvec[1:5], 0)
svec <- c(bsvec[6:11], qnorm(1 - alpha/2))

# Plot the functions b and s 
plot(seq(0, 8, by = 0.1), splineval[, 2], type = "l", main = "b function",
     ylab = " ", las = 1, lwd = 2, xaxs = "i", col = "blue", xlab = "x")
points(xseq, bvec, pch = 19, col = "blue")
plot(seq(0, 8, by = 0.1), splineval[, 3], type = "l", main = "s function",
     ylab = " ", las = 1, lwd = 2, xaxs = "i", col = "blue", xlab = "x")
points(xseq, svec, pch = 19, col = "blue")

Figure 1 shows the graphs of the functions \(b\) and \(s\) that specify the CIUUPI, when these functions are described using natural cubic spline interpolation, for this example. For comparison, Figure 2 shows the graphs of the functions \(b\) and \(s\) that specify the CIUUPI, when these functions are described using clamped cubic spline interpolation. These figures are quite similar; there is a small difference in both the \(b\) and \(s\) functions near \(x = 6\).

We can also use the vector \(\big(b(1), b(2), \dots, b(5), s(0), s(1), \dots, s(5) \big)\) that specifies the CIUUPI to evaluate and then plot the coverage probability \(CP(\gamma; b, s, \rho)\) and scaled expected length \(SEL(\gamma; s, \rho)\) as functions of \(\gamma\). This is done using the following code.

# Compute the coverage probability and scaled expected for a grid of values of gamma
gam <- seq(0, 10, by = 0.1)
cp <- cpciuupi(gam, bsvec, alpha, a = a, c = c, x = x)
sel <- selciuupi(gam, bsvec, alpha, a = a, c = c, x = x)

# Plot the coverage probability and squared scaled expected length
plot(gam, cp, type = "l", lwd = 2, ylab = "", las = 1, xaxs = "i",
main = "Coverage Probability", col = "blue", 
xlab = expression(paste("|", gamma, "|")), ylim = c(0.9495, 0.9505))
abline(h = 1-alpha, lty = 2)
plot(gam, sel^2, type = "l", lwd = 2, ylab = "", las = 1, xaxs = "i",
main = "Squared SEL", col = "blue", 
xlab = expression(paste("|", gamma, "|")), ylim = c(0.83, 1.17))
abline(h = 1, lty = 2)

graphic without alt text — Figure 1: Graphs of the functions \(b\) and \(s\) for the factorial experiment example for the CIUUPI, with minimum coverage probability 0.95, when they are described using natural cubic spline interpolation, for the factorial experiment example.

Figure 3 shows the graphs of \(CP(\gamma; b, s, \rho)\) and the square of \(SEL(\gamma; b, s, \rho)\) for the CIUUPI (where the functions \(b\) and \(s\) have been specified by natural cubic spline interpolation) produced by this code.

We can see from Figure 3 that, regardless of the value of \(\gamma\), the coverage probability of the CIUUPI is extremely close to \(1 - \alpha\). We can also see that the expected length of the CIUUPI is less than the expected length of the standard confidence interval when \(\gamma\) is small, with the minimum scaled expected length achieved when \(\gamma = 0\). For moderate values of \(|\gamma|\), the expected length of the standard interval is less than the expected length of the CIUUPI. However, for large \(|\gamma|\), the expected length of the CIUUPI is essentially the same as the expected length of the standard interval.

For comparison, Figure 4 shows the graphs of \(CP(\gamma; b, s, \rho)\) and the square of \(SEL(\gamma; b, s, \rho)\) for the CIUUPI when the functions \(b\) and \(s\) are described by clamped cubic spline interpolation.

Figures [fig:bs_comp] and [fig:BCI_cov] show the differences between the Bayesian 95% credible interval and the 95% CIUUPI. Figure [fig:bs_comp] shows the graphs of the \(b\) and \(b_B\) functions (left panel), and the \(s\) and \(s_B\) functions (right panel), for the factorial experiment example. Note that, similarly to \(b\) and \(s\), \(b_B\) is an odd continuous function and \(s_B\) is an even continuous function. Figure [fig:BCI_cov] shows the graph of the frequentist coverage probability of the Bayesian 95% credible interval, for the factorial experiment example. This coverage probability is also an even function of \(\gamma\). Unlike the coverage probability of the CIUUPI, the minimum over \(\gamma\) of the frequentist coverage probability of the Bayesian 95% credible interval is substantially less than 0.95.

Computing the confidence interval (using the observed response)

The observed response for the factorial experiment example data is \(\boldsymbol{y} = (87.2, 88.4, 86.7, 89.2)\) and \(\sigma\) is assumed to take the value 0.8. We use the function ciuupi to return the confidence interval (1) for \(\theta\) that utilizes the uncertain prior information that \(\tau = 0\). Continuing from the previous example, this is done in R as follows:

# Using the vector (b(1),b(2),...,b(5),s(0),s(1),...,s(5)), compute the CIUUPI
# for this particular data
t <- 0
y <- c(87.2, 88.4, 86.7, 89.2)

ci <- ciuupi(alpha, a, c, x, bsvec, t, y, natural = 1, sig = 0.8); ci

We obtain the output

            lower    upper
ciuupi   -0.7710755 3.218500

For comparison purposes, the function standard_CI will return the standard confidence interval (3) for \(\theta\). The code

# Compute the standard confidence interval
cistandard(a = a, x = x, y = y, alpha = alpha, sig = 0.8)

will return

             lower    upper
standard -1.017446 3.417446

The 95% confidence interval that utilizes uncertain prior information \([-0.77, 3.22]\) is much shorter than the standard confidence interval \([-1.02, 3.42]\). These are observed values of confidence intervals that have, to an excellent approximation, the same coverage probability. For comparison, a 95% Bayesian credible interval for \(\theta\) is \([-0.25, 3.51]\). Although this interval is shorter than the CIUUPI, it can be seen from Figure [fig:BCI_cov] that the minimum over \(\gamma\) of the frequentist coverage of the Bayesian credible interval is substantially less than 0.95.

6 Discussion

It is very common in applied statistics to carry out preliminary data-based model selection using, for example, hypothesis tests or minimizing a criterion such as the AIC. As pointed out by Leamer (1978) ((1978), chapter 5), such model selection may be motivated by the desire to utilize uncertain prior information in subsequent statistical inference. He goes even further when he states, on p.123, that “The mining of data that is common among non-experimental scientists constitutes prima facie evidence of the existence of prior information”. One may attempt to utilize such prior information by constructing confidence intervals, using the same data, based on the assumption that the selected model had been given to us a priori, as the true model. This assumption is false and it can lead to confidence intervals that have minimum coverage probability far below the desired minimum coverage \(1 - \alpha\) (see e.g. Kabaila (2009), (2009), Leeb and Pötscher (2005), (2005)), making them invalid.

A numerical constrained optimization approach to the construction of valid confidence intervals and sets that utilize uncertain prior information has been applied by (Farchione and Kabaila 2008), (Kabaila and Giri 2009a), (Kabaila and Giri 2013), (Kabaila and Giri 2014), (Kabaila and Tissera 2014) and (Abeysekera and Kabaila 2017). In each case, numerical constrained optimization was performed using programs written in MATLAB, restricting the accessibility of these confidence intervals and sets. The R package ciuupi is a first step in making these types of confidence intervals and sets more widely accessible.

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

ciuupi: An R package for Computing Confidence Intervals that Utilize Uncertain Prior Information

1 Introduction

2 Motivating examples

3 The confidence interval that utilizes uncertain prior information computed by ciuupi

Numerical constrained optimization method used to compute the vector \((b(1), b(2), \dots, b(5),s(0), s(1), \dots, s(5))\)

4 A comparison of the CIUUPI with a Bayesian interval estimator

5 Using the ciuupi package

Factorial experiment example

Evaluating the confidence interval (no examination of the observed response)

Computing the confidence interval (using the observed response)

6 Discussion

CRAN packages used

CRAN Task Views implied by cited packages

Note

References

Reuse

Citation