Gaussian stochastic process (GaSP) emulation is a powerful tool for approximating computationally intensive computer models. However, estimation of parameters in the GaSP emulator is a challenging task. No closed-form estimator is available and many numerical problems arise with standard estimates, e.g., the maximum likelihood estimator. In this package, we implement a marginal posterior mode estimator, for special priors and parameterizations. This estimation method that meets the robust parameter estimation criteria was discussed in (Gu et al. 2018); mathematical reasons are provided therein to explain why robust parameter estimation can greatly improve predictive performance of the emulator. In addition, inert inputs (inputs that almost have no effect on the variability of a function) can be identified from the marginal posterior mode estimation at no extra computational cost. The package also implements the parallel partial Gaussian stochastic process (PP GaSP) emulator ((Gu and Berger 2016)) for the scenario where the computer model has multiple outputs on, for example, spatial-temporal coordinates. The package can be operated in a default mode, but also allows numerous user specifications, such as the capability of specifying trend functions and noise terms. Examples are studied herein to highlight the performance of the package in terms of out-of-sample prediction.

A GaSP emulator is a fast surrogate model used to approximate the outcomes of a computer model ((Sacks et al. 1989; Bayarri et al. 2007; Paulo et al. 2012; Palomo et al. 2015; Gu and Berger 2016)). The prediction accuracy of the emulator often depends strongly on the quality of the parameter estimates in the GaSP model. Although the mean and variance parameters in the GaSP model are relatively easy to deal with, estimation of parameters in the correlation functions is difficult ((Kennedy and O’Hagan 2001)). Standard methods of estimating these parameters, such as maximum likelihood estimation (MLE), often produce unstable results leading to inferior prediction. As shown in ((Gu et al. 2018)), the GaSP emulator is unstable when the correlation between any two different inputs are estimated to be close to one or to zero. The former case causes a near singularity when inverting the covariance matrix (this can partially be addressed by adding a small nugget ((Andrianakis and Challenor 2012))), while the latter problem happens more often and has no easy fix.

There are several packages on the Comprehensive R Archive Network (CRAN,
https://CRAN.R-project.org/) which implement the GaSP model based on
the MLE, including
*DiceKriging*
((Roustant et al. 2012)),
*GPfit*
((MacDonald et al. 2015)),
*mleGP* ((Dancik 2013)),
*spatial*
((Venables and Ripley 2002)), and
*fields* ((Nychka et al. 2016)).
In these packages, bounds on the parameters in the correlation function
are typically implemented to overcome the numerical problems with the
MLE estimates. Predictions are, however, often quite sensitive to the
choice of bound, which is essentially arbitrary, so this is not an
appealing fix to the numerical problems.

In (Gu 2016), marginal posterior modes based on several objective
priors are studied. It has been found that certain parameterizations
result in more robust estimators than others, and, more importantly,
that some parameterizations which are in common use should clearly be
avoided. Marginal posterior modes with the robust parameterization are
mathematically stable, as the posterior density is shown to be zero at
the two problematic cases–when the correlation is nearly equal to one
or to zero. This motivates the
*RobustGaSP* package;
examples also indicate that the package results in more accurate in
out-of-sample predictions than previous packages based on the MLE. We
use the *DiceKriging* package in these comparisons, because it is a
state-of-the-art implementation of the MLE methodology

The *RobustGaSP* package ((Gu et al. 2016)) for R builds a GaSP
emulator with robust parameter estimation. It provides a default method
with regard to a specific correlation function, a mean/trend function
and an objective prior for the parameters. Users are allowed to specify
them, for example, by using a different correlation and/or trend
function, another prior distribution, or by adding a noise term with
either a fixed or estimated variance. Although the main purpose of the
*RobustGaSP* package is to do emulation/approximation of a complex
function, this package can also be used in fitting the GaSP model for
other purposes, such as nonparameteric regression, modeling spatial data
and so on. For computational purposes, most of the time consuming
functions in the *RobustGaSP* package are implemented in C++.

We highlight several contributions of this work. First of all, to
compute the derivative of the reference prior with a robust
parametrization in ((Gu et al. 2018)) is computationally expensive,
however this information is needed to find the posterior mode by the
low-storage quasi-Newton optimization method ((Nocedal 1980)).
We introduce a robust and computationally efficient prior, called the
jointly robust prior ((Gu 2018)), to approximate the reference
prior in the tail rates of the posterior. This has been implemented as a
default setting in the *RobustGaSP* package.

Furthermore, the use of the jointly robust prior provides a natural
shrinkage for sparsity and thus can be used to identify inert/noisy
inputs (if there are any), implemented in the `findInertInputs`

function
in the *RobustGaSP* package. A formal approach to Bayesian model
selection requires a comparison of \(2^p\) models for \(p\) variables,
whereas in the *RobustGaSP* package, only the posterior mode of the full
model has to be computed. Eliminating mostly inert inputs in a computer
model is similar to not including regression coefficients that have a
weak effect, since the noise introduced in their estimation degrades
prediction. However, as the inputs have a nonlinear effect to the
output, variable selection in GaSP is typically much harder than the one
in the linear regression. The `findInertInputs`

function in the
*RobustGaSP* package can be used, as a fast pre-experimental check, to
separate the influential inputs and inert inputs in highly nonlinear
computer model outputs.

The *RobustGaSP* package also provides some regular model checks in
fitting the emulator, while the robustness in the predictive performance
is the focus in (Gu et al. 2018). More specifically, the
leave-one-out cross validation, standardized residuals and Normal
QQ-plot of the standardized residuals are implemented and will be
introduced in this work.

Lastly, some computer models have multiple outputs. For example, each run of the TITAN2D simulator produces up to \(10^9\) outputs of the pyroclastic flow heights over a spatial-temporal grid of coordinates ((Patra et al. 2005; Bayarri et al. 2009)). The computational complexity of building a separate GaSP emulator for the output at each grid is \(O(kn^3)\), where \(k\) is the number of grids and \(n\) is the number of computer model runs. The package also implements another computationally more efficient emulator, called the parallel partial Gaussian stochastic process emulator, which has the computational complexity being the maximum of \(O(n^3)\) and \(O(kn^2)\) ((Gu and Berger 2016)). When the number of outputs in each simulation is large, the computational cost of PP GaSP is much smaller than the separate emulator of each output.

The rest of the paper is organized as follows. In the next section, we briefly review the statistical methodology of the GaSP emulator and the robust posterior mode estimation. In Section 3, we describe the structure of the package and highlight the main functions implemented in this package. In Section 4, several numerical examples are provided to illustrate the behavior of the package under different scenarios. In Section 5, we present conclusions and briefly discuss potential extensions. Examples will be provided throughout the paper for illustrative purposes.

Prior to introducing specific functions and usage of the *RobustGaSP*
package, we first review the statistical formulation of the GaSP
emulator of the computer model with real-valued scalar outputs. Let
\(\mathbf{x} \in \mathcal X\) denote a \(p\)-dimensional vector of inputs
for the computer model, and let \(y(\mathbf{x})\) denote the resulting
simulator output, which is assumed to be real-valued in this section.
The simulator \(y(\mathbf{x})\) is viewed as an unknown function modeled
by the stationary GaSP model, meaning that for any inputs
\(\{ \mathbf{x}_1,\ldots,\mathbf{x}_n\}\) from \(\mathcal{X}\), the
likelihood is a multivariate normal distribution,
\[\label{equ:multinormal}
(y(\mathbf{x}_1),\ldots,y(\mathbf{x}_n))^\top\mid \mathbf \mu, \,\sigma^2, \,{\mathbf R} \sim \mathcal{MN} ((\mu(\mathbf{x}_1),\ldots,\mu(\mathbf{x}_n))^\top, \sigma^2 {\mathbf R} )\,, \tag{1}\]
here \(\mu(\cdot)\) is the mean function, \(\sigma^2\) is the unknown
variance parameter and \({\mathbf R}\) is the correlation matrix. The mean
function is typically modeled via regression,
\[\label{equ:gp_mean}
\mu(\mathbf{x})= \mathbf h(\mathbf{x}){\mathbf \theta} = \sum^q_{t=1} h_t(\mathbf{x})\theta_t \,, \tag{2}\]
where
\(\mathbf h(\mathbf{x})=\left(h_1(\mathbf{x}),h_2(\mathbf{x}),...,h_q(\mathbf{x})\right)\)
is a vector of specified mean basis functions and \(\theta_t\) is the
unknown regression parameter for basis function \(h_t(\cdot)\). In the
default setting of the *RobustGaSP* package, a constant basis function
is used, i.e., \(h(\mathbf{x})=1\); alternatively, a general mean
structure can be specified by the user (see Section 3
for details).

The \((i,j)\) element of \(\mathbf R\) in ((1)) is
modeled through a correlation function \(c(\mathbf x_i, \mathbf x_j)\).
The product correlation function is often assumed in the emulation of
computer models ((Santner et al. 2003)),
\[\label{equ:product_c}
c(\mathbf x_i, \, \mathbf x_j)=\prod^p_{l=1}c_l( x_{il}, x_{jl}), \tag{3}\]
where \(c_l(\cdot,\,\cdot)\) is an one-dimensional correlation function
for the \(l^{th}\) coordinate of the input vector. Some frequently chosen
correlation functions are implemented in the *RobustGaSP* package,
listed in Table 1. In order to use the power exponential
covariance function, one needs to specify the roughness parameter
\(\alpha_l\), which is often set to be close to 2; e.g., \(\alpha_l=1.9\) is
advocated in (Bayarri et al. 2009), which maintains an adequate smoothness level
yet avoids the numerical problems with \(\alpha_l=2\).

The Matérn correlation is commonly used in modeling spatial data
((Stein 2012)) and has recently been advocated for computer
model emulation ((Gu et al. 2018)); one benefit is that the roughness
parameter of the Matérn correlation directly controls the smoothness of
the process. For example, the Matérn correlation with \(\alpha_l=5/2\)
results in sample paths of the GaSP that are twice differentiable, a
smoothness level that is usually desirable. Obtaining this smoothness
with the more common squared exponential correlation comes at a price,
however, as, for large distances, the correlation drops quickly to zero.
For the Matérn correlation with \(\alpha_l=5/2\), the natural logarithm of
the correlation only decreases linearly with distance, a feature which
is much better for emulation of computer models. Based on these reasons,
the Matérn correlation with \(\alpha_l=5/2\) is the default correlation
function in *RobustGaSP*. It is also the default correlation function in
some other packages, such as *DiceKriging* ((Roustant et al. 2012)).

Matérn \(\alpha=5/2\) | \(\left(1+\frac{\sqrt{5}d}{\gamma}+\frac{5d^2}{3\gamma^2}\right)\exp\left(-\frac{\sqrt{5}d}{\gamma}\right)\) |

Matérn \(\alpha=3/2\) | \(\left(1+\frac{\sqrt{3}d}{\gamma}\right)\exp\left(-\frac{\sqrt{3}d}{\gamma}\right)\) |

Power exponential | \(\exp\left\{-\left(\frac{d}{\gamma}\right)^{\alpha}\right\}\), \(0<\alpha\leq 2\) |

Since the simulator is expensive to run, we will at most be able to evaluate \(y(\mathbf{x})\) at a set of design points. Denote the chosen design inputs as \(\mathbf x^{\mathcal D}=\{\mathbf x_1^{\mathcal D},\mathbf x_2^{\mathcal D},..., \mathbf x_n^{\mathcal D}\}\), where \(\mathcal D \subset \mathcal X\). The resulting outcomes of the simulator are denoted as \(\mathbf y^{\mathcal D}=(y^{\mathcal D}_1,y^{\mathcal D}_2,...,y^{\mathcal D}_n)^\top\). The design points are usually chosen to be “space-filling", including the uniform design and lattice designs. The Latin hypercube (LH) design is a”space-filling" design that is widely used. It is defined in a rectangle whereby each sample is the only one in each axis-aligned hyperplane containing it. LH sampling for a 1-dimensional input space is equivalent to stratified sampling, and the variance of an estimator based on stratified sampling has less variance than the random sampling scheme ((Santner et al. 2003)); for a multi-dimensional input space, the projection of the LH samples on each dimension spreads out more evenly compared to simple stratified sampling. The LH design is also often used along with other constraints, e.g., the maximin Latin Hypercube maximizes the minimum Euclidean distance in the LH samples. It has been shown that the GaSP emulator based on maximin LH samples has a clear advantage compared to the uniform design in terms of prediction (see, e.g., (Chen et al. 2016)). For these reasons, we recommend the use of the LH design, rather than the uniform design or lattice designs.

The parameters in a GaSP emulator include mean parameters, a variance
parameter, and range parameters, denoted as
\((\theta_1,..,\theta_q,\sigma^2,\gamma_1,...,\gamma_p)\). The objective
prior implemented in the *RobustGaSP* package has the form
\[\label{equ:prior}
\pi(\mathbf \theta, \sigma^2,\mathbf \gamma)\propto \frac{\pi(\mathbf \gamma)}{\sigma^2}, \tag{4}\]
where \(\pi(\mathbf \gamma)\) is an objective prior for the range
parameters. After integrating out \((\mathbf \theta, \sigma^2)\) by the
prior in ((4)), the marginal likelihood is
\[\label{equ:mp}
{ \mathcal{L}(\mathbf y^{\mathcal D}| \mathbf \gamma)} \propto |{\mathbf R}|^{- \frac{1}{2}} |{{\mathbf h^\top(\mathbf x^{\mathcal D} ) }{ \mathbf R^{ - 1}}{\mathbf h(\mathbf x^{\mathcal D} )} }|^{- \frac{1}{2}} \left( S^2\right)^{-(\frac{n - q}{2})} , \tag{5}\]
where \(S^2= { ({\mathbf{y}^{\mathcal D})^\top{\mathbf { Q} } {\mathbf{y}^{\mathcal D}}} }\) with
\(\mathbf { Q} = { \mathbf { R}^{ - 1}}\mathbf { P}\) and
\(\mathbf { P} = \mathbf I_n - {\mathbf h(\mathbf x^{\mathcal D} )} {\{{{\mathbf h^\top(\mathbf x^{\mathcal D} ) }{ \mathbf R^{ - 1}}{\mathbf h(\mathbf x^{\mathcal D} )} }\}^{ - 1}} \mathbf h^\top(\mathbf x^{\mathcal D} ) {{\mathbf { R}}^{ - 1}}\),
with \(\mathbf I_n\) being the identity matrix of size \(n\).

The reference prior \(\pi^{R}(\cdot)\) and the jointly robust prior
\(\pi^{JR}(\cdot)\) for the range parameters with robust parameterizations
implemented in the *RobustGaSP* package are listed in
Table 2. Although the computational complexity of the
value of the reference prior is the same as the marginal likelihood, the
derivatives of the reference prior are computationally hard. The
numerical derivative is thus computed in the package in finding the
marginal posterior mode using the reference prior. Furthermore, the
package incorporates, by default, the jointly robust prior with the
prior parameters \((C_1,\ldots,C_p,a,b)\) (whose values are given in
Table 2). The properties of the jointly robust prior are
studied extensively in (Gu 2018). The jointly robust prior
approximates the reference prior reasonably well with the default prior
parameters, and has a close form derivatives. The jointly robust prior
is a proper prior with a closed form of the normalizing constant and the
first two moments. In addition, the posterior modes of the jointly
robust prior can identify the inert inputs, as discussed in Section
2.4.

\(\pi^R(\mathbf \gamma)\) | \(|{\mathbf I^*}({\mathbf \gamma})|^{1/2}\) |

\(\pi^R(\mathbf \xi)\) | \(|{\mathbf I^*}({\mathbf \xi})|^{1/2}\) with \(\xi_l=\log(1/\gamma_l)\), for \(l=1,...,p\) |

\(\pi^{JR}(\mathbf \beta)\) | \((\sum^{p}_{l=1} C_l \beta_l)^{a} exp(-b\sum^{p}_{l=1} C_l\beta_l)\), with \(\beta_l=1/\gamma_l\), for \(l=1,...,p\) |

The range parameters \((\gamma_1,...,\gamma_p)\) are estimated by the modes of the marginal posterior distribution \[\label{equ:est_gamma} ({\hat \gamma}_1, \ldots {\hat \gamma}_p)= \mathop{arg max }\limits_{ \gamma_1,\ldots, \gamma_p} ( \mathcal{L}( \mathbf{y}^{\mathcal D}|{ \gamma_1}, \ldots, {\gamma_p}) \pi({\gamma_1}, \ldots, {\gamma_p})). \tag{6}\] When another parameterization is used, parameters are first estimated by the posterior mode and then transformed back to obtain \(({\hat \gamma}_1, \ldots {\hat \gamma}_p)\).

Various functions implemented in the *RobustGaSP* package can be reused
in other studies. `log_marginal_lik`

and `log_marginal_lik_deriv`

give
the natural logarithm of the marginal likelihood in ((5))
and its directional derivatives with regard to \(\mathbf \gamma\),
respectively. The reference priors \(\pi^R(\mathbf \gamma)\) and
\(\pi^R(\mathbf \xi)\) are not coded separately, but
`neg_log_marginal_post_ref`

gives the negative values of the log
marginal posterior distribution and thus one can use
-`neg_log_marginal_post_ref`

minus `log_marginal_lik`

to get the log
reference prior. The jointly robust prior \(\pi^{JR}(\mathbf \beta)\) and
its directional derivatives with regard to \(\mathbf \beta\) are coded in
`log_approx_ref_prior`

and `log_approx_ref_prior_deriv`

, respectively.
All these functions are not implemented in other packages and can be
reused in other theoretical studies and applications.

After obtaining \(\hat{\mathbf \gamma}\), the predictive distribution of the GaSP emulator (after marginalizing \((\mathbf \theta, \sigma^2)\) out) at a new input point \(\mathbf x^*\) follows a student \(t\) distribution \[\label{equ:predictiongp} y({\mathbf x^{*}}) \mid \mathbf{y}^\mathcal{D},\, \hat{\mathbf \gamma} \sim \mathcal T ( \hat y({\mathbf x}^{*}),\hat{\sigma}^2c^{**},n - q)\,, \tag{7}\] with \(n-q\) degrees of freedom, where \[\begin{aligned} \label{equ:gppredmean} \hat{y} ({\mathbf x}^{*}) &=& { \mathbf h({\mathbf x}^{*})} \hat{\mathbf{\theta}}+\mathbf{r}^\top(\mathbf{x}^*){{\mathbf R}}^{-1}\left(\mathbf{y}^\mathcal{D}-{{\mathbf{h}(\mathbf{x}^\mathcal{D} )}}\hat{\mathbf{\theta}}\right), \nonumber\\ \hat{\sigma}^2 &=&(n-q)^{-1}{\left(\mathbf{y}^\mathcal{D}-{\mathbf{h}(\mathbf{x}^\mathcal{D} )}\hat{\mathbf{\theta}}\right)}^{T}{{\mathbf R}}^{-1}\left({\mathbf{y}^\mathcal{D}}-{{\mathbf{h}}(\mathbf{x}^\mathcal{D} )}\hat{\mathbf{\theta}}\right), \nonumber \\ c^{**} &=& c({\mathbf x^{*}}, {\mathbf x^{*}})-{ \mathbf{r}^\top(\mathbf{x}^*){ {\mathbf R}}^{-1}\mathbf{r}(\mathbf{x}^*)} + \left({{\mathbf h(\mathbf x^{*})}-{\mathbf{h}^\top(\mathbf{x}^\mathcal{D} )}{\mathbf R}^{-1}\mathbf{r}(\mathbf{x}^*) }\right) ^\top \nonumber \\ && \times \left(\mathbf{h}^\top(\mathbf{x}^\mathcal{D}){{\mathbf R}}^{-1}{\mathbf{h}(\mathbf{x}^\mathcal{D} )}\right)^{-1}\left({{\mathbf h(\mathbf x^{*})}}-{\mathbf{h}^\top(\mathbf{x}^\mathcal{D} )} {{\mathbf R}}^{-1}\mathbf{r}(\mathbf{x}^*) \right), \end{aligned} \tag{8}\] with \(\hat{\mathbf{\theta}}=\left({{\mathbf{h}}^{T}(\mathbf{x}^\mathcal{D} )}{{\mathbf R}}^{-1} \ {{\mathbf{h}}(\mathbf{x}^\mathcal{D} )}\right)^{-1}{\mathbf{h}^\top(\mathbf{x}^\mathcal{D} )}{{\mathbf R}}^{-1}{\mathbf{y}^\mathcal{D}}\) being the generalized least squares estimator for \(\mathbf \theta\) and \(\mathbf{r}(\mathbf{x}^*) = (c(\mathbf{x}^*,{\mathbf{x}}^{\mathcal D}_1 ), \ldots, c(\mathbf{x}^*,{\mathbf{x}}^{\mathcal D}_n ))^\top\).

The emulator interpolates the simulator at the design points \(\mathbf {x}^{\mathcal{D}}_i\), \(1\leq i\leq n\), because when \(\mathbf x^*=\mathbf{x}^\mathcal{D}_i\), one has \(\mathbf r^\top(\mathbf x^*) \mathbf R^{-1} = \mathbf e^\top_i\), where \(\mathbf e_i\) is the \(n\) dimensional vector with the \(i^{th}\) entry being \(1\) and the others being \(0\). At other inputs, the emulator not only provides a prediction of the simulator (i.e., \(\hat y({\mathbf x}^{*})\)) but also an assessment of prediction accuracy. It also incorporates the uncertainty arising from estimating \(\mathbf{\theta}\) and \(\sigma^2\) since this was developed from a Bayesian perspective.

We now provide an example in which the input has one dimension, ranging
from \([0,10]\) ((Higdon et al. 2002)). Estimation of the range parameters
using the *RobustGaSP* package can be done through the following code:

```
> library(RobustGaSP)
R> library(lhs)
R> set.seed(1)
R> input <- 10 * maximinLHS(n=15, k=1)
R> output <- higdon.1.data(input)
R> model <- rgasp(design = input, response = output)
R> model R
```

```
:
Callrgasp(design = input, response = output)
: 0.03014553
Mean parameters: 0.5696874
Variance parameter: 1.752277
Range parameters: 0 Noise parameter
```

The fourth line of the code generates 15 LH samples at \([0,10]\) through
the `maximinLHS`

function of the
*lhs* package ((Carnell 2016)).
The function `higdon.1.data`

is provided within the *RobustGaSP* package
which has the form \(y(x)=\sin(2\pi x/10)+0.2\sin(2\pi x/2.5)\). The third
line fits a GaSP model with the robust parameter estimation by marginal
posterior modes.

The `plot`

function in *RobustGaSP* package implements the leave-one-out
cross validation for a `"rgasp"`

class after the GaSP model is built
(see Figure 1 for its output):

`> plot(model) R`

The prediction at a set of input points can be done by the following code:

```
> testing_input <- as.matrix(seq(0, 10, 1/50))
R> model.predict<-predict(model, testing_input)
R> names(model.predict) R
```

`1] "mean" "lower95" "upper95" "sd" [`

The `predict`

function generates a list containing the predictive mean,
lower and upper \(95\%\) quantiles and the predictive standard deviation,
at each test point \(\mathbf x^*\). The prediction and the real outputs
are plotted in Figure 2; produced by the
following code:

```
> testing_output <- higdon.1.data(testing_input)
R> plot(testing_input, model.predict$mean,type='l', col='blue',
R+ xlab='input', ylab='output')
> polygon( c(testing_input,rev(testing_input)), c(model.predict$lower95,
R+ rev(model.predict$upper95)), col = "grey80", border = F)
> lines(testing_input, testing_output)
R> lines(testing_input,model.predict$mean, type='l', col='blue')
R> lines(input, output, type='p') R
```

It is also possible to sample from the predictive distribution (which is a multivariate \(t\) distribution) using the following code:

```
> model.sample <- simulate(model, testing_input, num_sample=10)
R> matplot(testing_input, model.sample, type='l', xlab='input', ylab='output')
R> lines(input, output,type='p') R
```

The plots of 10 posterior predictive samples are shown in Figure 3.

Some inputs have little effect on the output of a computer model. Such
inputs are called inert inputs ((Linkletter et al. 2006)). To quantify
the influence of a set of inputs on the variability of the outputs,
functional analysis of the variance (functional ANOVA) can be used,
often implemented through Sobol’s Indices
((Sobol’ 1990; Sobol 2001)). Methods for numerical
calculation of Sobol’s Indices have been implemented in the
*sensitivity* package
((Pujol et al. 2016)) for R.

The identification of inert inputs through the posterior modes with the
jointly robust prior (\(\pi^{JR}(\cdot)\)) for the range parameters is
discussed in (Gu 2018). The package discussed here implements
this idea, using the *estimated normalized inverse range parameters*,
\[\label{equ:P_l}
\hat P_l =\frac{pC_l \hat \beta_l}{\sum^{p}_{i=1}C_i\hat \beta_i}, \tag{9}\]
for \(l=1,...,p\). The involvement of \(C_l\) (defined in
Table 2) is to account for the different scales of
different inputs. The denominator \((\sum^{p}_{i=1}C_i\hat \beta_i)\)
reflects the overall size of the estimator and \(C_l \hat \beta_l\) gives
the contribution of the \(l^{th}\) input. The average \(\hat P_l\) is \(1\)
and the sum of \(\hat P_l\) is \(p\). When \(\hat P_l\) is very close to 0, it
means the \(l^{th}\) input might be an inert input. In the *RobustGaSP*
package, the default threshold is 0.1; i.e., when \(\hat P_l<0.1\), it is
suggested to be an inert input. The threshold can also be specified by
users through the argument `threshold`

in the function
`findInertInputs`

.

For demonstration purpose, we build a GaSP emulator for the borehole experiment ((Worley 1987; Morris et al. 1993; An and Owen 2001)), a well-studied computer experiment benchmark which models water flow through a borehole. The output \(y\) is the flow rate through the borehole in \(m^3/year\) and it is determined by the equation: \[y=\frac{2\pi T_{u}(H_u-H_l)} {ln(r/r_{\omega})\big[1+\frac{2LT_u}{ln(r/r_{\omega})r^2_{\omega}K_{\omega}} +\frac{T_{u}}{T_l}\big]},\] where \(r_{\omega},r, T_{u}, H_u, T_l,H_l, L\) and \(K_{\omega}\) are the 8 inputs constrained in a rectangular domain with the following ranges \[\begin{aligned} &r_{\omega}\in [0.05,0.15], \quad r\in[100,50000], \quad T_u\in [63070, 115600], \quad H_u \in [990, 1110],\\ &T_l \in [63.1, 116], \quad H_l \in [700, 820], \quad L \in [1120, 1680], \quad K_{\omega} \in[9855, 12045]. \end{aligned}\] We use 40 maximin LH samples to find inert inputs at the borehole function through the following code.

```
> set.seed(1)
R> input <- maximinLHS(n=40, k=8) # maximin lhd sample
R> # rescale the design to the domain of the borehole function
R> LB <- c(0.05, 100, 63070, 990, 63.1, 700, 1120, 9855)
R> UB <- c(0.15, 50000, 115600, 1110, 116, 820, 1680, 12045)
R> range <- UB - LB
R> for(i in 1:8) {
R> input[,i] = LB[i] + range[i] * input[,i]
R> }
R> num_obs <- dim(input)[1]
R> output <- matrix(0,num_obs,1)
R> for(i in 1:num_obs) {
R+ output[i] <- borehole(input[i,])
+ }
> m <- rgasp(design = input, response = output, lower_bound=FALSE)
R> P <- findInertInputs(m) R
```

```
: 3.440765 8.13156e-09
The estimated normalized inverse range parameters are 4.983695e-09 0.844324 4.666519e-09 1.31081 1.903236 0.5008652
2 3 5 are suspected to be inert inputs The inputs
```

Similar to the automatic relevance determination model in neural
networks, e.g. (MacKay 1996; Neal 1996), and in machine
learning, e.g. (Tipping 2001; Li et al. 2002), the function
`findInertInputs`

of the *RobustGaSP* package indicates that the
\(2^{nd}, 3^{rd}\), and \(5^{th}\) inputs are suspected to be inert inputs.
Figure 4 presents the plots of the borehole
function when varying one input at a time. This analyzes the *local*
sensitivity of an input when having the others fixed. Indeed, the output
of the borehole function changes very little when the \(2^{nd}, 3^{rd}\),
and \(5^{th}\) inputs vary.

The ideal situation for a computer model is that it produces noise-free data, meaning that the output will not change at the same input. However, there are several cases in which the outputs are noisy. First of all, the numerical solution of the partial differential equations of a computer model could introduce small errors. Secondly, when only a subset of inputs are analyzed, the computer model is no longer deterministic given only the subset of inputs. For example, if we only use the 5 influential inputs of the borehole function, the outcomes of this function are no longer deterministic, since the variation of the inert inputs still affects the outputs a little. Moreover, some computer models might be stochastic or have random terms in the models.

For these situations, the common adjustment is to add a noise term to account for the error, such as \(\tilde y(\cdot)=y(\cdot)+\epsilon\), where \(y(\cdot)\) is the noise-free GaSP and \(\epsilon\) is an i.i.d. mean-zero Gaussian white noise ((Ren et al. 2012; Gu and Berger 2016)). To allow for marginalizing out the variance parameter, the covariance function for the new process \(\tilde y(\cdot)\) can be parameterized as follows: \[\label{equ:refpriornug} \sigma^2 \tilde c({{\mathbf{x}}_l},{{\mathbf{x}}_m}) {=} \sigma^2 \{c({{\mathbf{x}}_l},{{\mathbf{x}}_m}) + \eta{\delta_{lm}}\}, \tag{10}\] where \(\eta\) is defined to be the nugget-variance ratio and \(\delta_{lm}\) is a Dirac delta function when \(l=m\), \(\delta_{lm}=1\). After adding the nugget, the covariance matrix becomes: \[\sigma^2\tilde {\bf R}=\sigma^2({\bf R}+\eta {\bf I}_{n}).\] Although we call \(\eta\) the nugget-variance ratio parameter, the analysis is different than when a nugget is directly added to stabilize the computation in the GaSP model. As pointed out in (Roustant et al. 2012), when a nugget is added to stabilize the computation, it is also added to the covariance function in prediction, and, hence, the resulting emulator is still an interpolator, meaning that the prediction will be exact at the design points. However, when a noise term is added, it does not go into the covariance function and the prediction at a design point will not be exact (because of the effect of the noise).

Objective Bayesian analysis for the proposed GaSP model with the noise term can be done by defining the prior \[\label{equ:prior_nug} \tilde \pi(\mathbf \theta, \sigma^2,\mathbf \gamma, \eta)\propto \frac{\tilde \pi(\mathbf \gamma, \eta)}{\sigma^2}, \tag{11}\] where \(\tilde \pi(\mathbf \gamma,\eta)\) is now the prior for the range and nugget-variance ratio parameters \((\mathbf \gamma, \eta)\). The reference prior and the jointly robust prior can also be extended to be \(\tilde \pi^{R}(\cdot)\) and \(\tilde \pi^{JR}(\cdot)\) with robust parameterizations listed in Table 2. Based on the computational feasibility of the derivatives and the capacity to identify noisy inputs, the proposed default setting is to use the jointly robust prior with specified prior parameters in Table 2.

\(\tilde \pi^R(\mathbf \gamma, \eta)\) | \(|{\mathbf I^*}({\mathbf \gamma}, \eta)|^{1/2}\) |

\(\tilde \pi^R(\mathbf \xi, \eta)\) | \(|{\mathbf I^*}({\mathbf \xi},\eta)|^{1/2}\) with \(\xi_l=\log(1/\gamma_l)\), for \(l=1,...,p\) |

\(\tilde \pi^{JR}(\mathbf \beta, \eta)\) | \((\sum^{p}_{l=1} C_l \beta_l)^{a} exp(-b(\sum^{p}_{l=1} C_l\beta_l+\eta))\), with \(\beta_l=1/\gamma_l\), for \(l=1,...,p\) |

As in the previous noise-free GaSP model, one can estimate the range and nugget-variance ratio parameters by their marginal maximum posterior modes \[\label{equ:est_nugget_gamma} ({\hat \gamma}_1, \ldots {\hat \gamma}_p,\hat \eta)= \mathop{arg max }\limits_{ \gamma_1,\ldots, \gamma_p,\eta} \mathcal{L}( \mathbf{y}^{\mathcal D}|{ \gamma_1}, \ldots, {\gamma_p},\eta ) \tilde \pi({\gamma_1}, \ldots, {\gamma_p}, \eta). \tag{12}\]

After obtaining \(\hat{ \mathbf \gamma}\) and \(\hat \eta\), the predictive distribution of the GaSP emulator is almost the same as in Equation (7); simply replace \(c(\cdot, \cdot)\) by \(\tilde c(\cdot, \cdot)\) and \(\mathbf R\) by \(\tilde {\mathbf R}\).

Using only the influential inputs of the borehole function, we construct the GaSP emulator with a nugget based on 30 maximin LH samples through the following code:

```
> m.subset <- rgasp(design = input[ ,c(1,4,6,7,8)], response = output,
R+ nugget.est=TRUE)
> m.subset R
```

```
:
Callrgasp(design = input[, c(1, 4, 6, 7, 8)], response = output,
nugget.est = TRUE)
: 170.9782
Mean parameters: 229820.7
Variance parameter: 0.2489396 1438.028 1185.202 5880.335 44434.42
Range parameters: 0.2265875 Noise parameter
```

To compare the performance of the emulator with and without a noise
term, we perform some out-of-sample testing. We build the GaSP emulator
by the *RobustGaSP* package and the *DiceKriging* package using the same
mean and covariance. In *RobustGaSP*, the parameters in the correlation
functions are estimated by marginal posterior modes with the robust
parameterization, while in *DiceKriging*, parameters are estimated by
MLE with upper and lower bounds. We first construct these four emulators
with the following code.

```
> m.full <- rgasp(design = input, response = output)
R> m.subset <- rgasp(design = input[ ,c(1,4,6,7,8)], response = output,
R+ nugget.est=TRUE)
> dk.full <- km(design = input, response = output)
R> dk.subset <- km(design = input[ ,c(1,4,6,7,8)], response = output,
R+ nugget.estim=TRUE)
```

We then compare the performance of the four different emulators at 100 random inputs for the borehole function.

```
> set.seed(1)
R> dim_inputs <- dim(input)[2]
R> num_testing_input <- 100
R> testing_input <- matrix(runif(num_testing_input*dim_inputs),
R+ num_testing_input,dim_inputs)
> for(i in 1:8) {
R> testing_input[,i] <- LB[i] + range[i] * testing_input[,i]
R> }
R> m.full.predict <- predict(m.full, testing_input)
R> m.subset.predict <- predict(m.subset, testing_input[ ,c(1,4,6,7,8)])
R> dk.full.predict <- predict(dk.full, newdata = testing_input,type = 'UK')
R> dk.subset.predict <- predict(dk.subset,
R+ newdata = testing_input[ ,c(1,4,6,7,8)],type = 'UK')
> testing_output <- matrix(0, num_testing_input, 1)
R> for(i in 1:num_testing_input) {
R+ testing_output[i] <- borehole(testing_input[i, ])
+ }
> m.full.error <- abs(m.full.predict$mean - testing_output)
R> m.subset.error <- abs(m.subset.predict$mean - testing_output)
R> dk.full.error <- abs(dk.full.predict$mean - testing_output)
R> dk.subset.error <- abs(dk.subset.predict$mean - testing_output) R
```

Since the *DiceKriging* package seems not to have implemented a method
to estimate the noise parameter, we only compare it with the nugget
case. The box plot of the absolute errors of these 4 emulators (all with
the same correlation and mean function) at 100 held-out points are shown
in Figure 5. The performance of the
*RobustGaSP* package based on the full set of inputs or only influential
inputs with a noise is similar, and they are both better than the
predictions from the *DiceKriging* package.

The main purpose of the *RobustGaSP* package is to predict a function at
unobserved points based on only a limited number of evaluations of the
function. The uncertainty associated with the predictions is obtained
from the predictive distribution in Equation
((7)), which is implemented in two steps. The
first step is to build a GaSP model through the `rgasp`

function. This
function allows users to specify the mean function, correlation
function, prior distribution for the parameters, and to include a noise
term or not. In the default setting, these are all specified. The mean
and variance parameters are handled in a fully Bayesian way, and the
range parameters in the correlation function are estimated by their
marginal posterior modes. Moreover, users can also fix the range
parameters, instead of estimating them, change/replace the mean
function, add a noise term, etc. The `rgasp`

function returns an object
of the `"rgasp"`

S4 class with all needed estimated parameters,
including the mean, variance, noise and range parameters to perform
predictions.

The second step is to compute the predictive distribution of the
previously created GaSP model through the `predict`

function, which
produces the predictive means, the \(95\%\) predictive credible intervals,
and the predictive standard deviations at each test point. As the
predictive distribution follows a student \(t\) distribution in
((7)) for any test points, any quantile/percentile
of the predictive distribution can be computed analytically. The joint
distribution at a set of test points is a multivariate \(t\) distribution
whose dimension is equal to the number of test points. Users can also
sample from the posterior predictive distribution by using the
`simulate`

function.

The identification of inert inputs can be performed using the
`findInertInput`

function. As it only depends on the inverse range
parameters through Equation ((9)), there is no extra
computational cost in their identification (once the robust GaSP model
has been built through the `rgasp`

function). We suggest using the
jointly robust prior by setting the argument `prior_choice="ref_approx"`

in the `rgasp`

function before calling the `findInertInput`

function,
because the penalty given by this prior is close to an \(L_1\) penalty for
the logarithm of the marginal likelihood (with the choice of default
prior parameters) and, hence, it can shrink the parameters for those
inputs with small effect.

Besides, the *RobustGaSP* package also implements the PP GaSP emulator
introduced in (Gu and Berger 2016) for the computer model with a vector of
outputs. In the PP GaSP emulator, the variances and the mean values of
the computer model outputs at different grids are allowed to be
different, whereas the covariance matrix of physical inputs are assumed
to be the same across grids. In estimation, the variance and the mean
parameters are first marginalized out with the reference priors. Then
the posterior mode is used for estimating the parameters in the kernel.
The `ppgasp`

function builds a PP GaSP model, which returns an object of
the `"ppgasp"`

S4 class with all needed estimated parameters. Then the
predictive distribution of PP GaSP model is computed through the
`predict.ppgasp`

function. Similar to the emulator of the output with
the scalar output, the `predict.ppgasp`

function returns the predictive
means, the \(95\%\) predictive credible intervals, and the predictive
standard deviations at each test point.

`rgasp`

functionThe `rgasp`

function is one of the most important functions, as it
performs the parameter estimation for the GaSP model of the computer
model with a scalar output. In this section, we briefly review the
implementation of the `rgasp`

function and its optimization algorithm.

The \(n\times p\) design matrix \(\mathbf x^{\mathcal D}\) and the
\(n \times 1\) output vector \(\mathbf y^{\mathcal D}\) are the only two
required arguments (without default values) in the `rgasp`

function. The
default setting in the argument `trend`

is a constant function, i.e.,
\(\mathbf h(\mathbf x^{\mathcal D})=\mathbf 1_n\). One can also set
`zero.mean=TRUE`

in the `rgasp`

function to assume the mean function in
GaSP model is zero. By default, the GaSP model is defined to be
noise-free, i.e., the noise parameter is \(0\). However, a noise term can
be added with estimated or fixed variance. As the noise is parameterized
following the form ((10)), the variance is
marginalized out explicitly and the nugget-variance parameter \(\eta\) is
left to be estimated. This can be done by specifying the argument
`nugget.est = T`

in the `rgasp`

function; when the nugget-variance
parameter \(\eta\) is known, it can be specified; e.g., \(\eta=0.01\)
indicates the nugget-variance ratio is equal to \(0.01\) in `rgasp`

and
\(\eta\) will be not be estimated with such a specification.

Two classes of priors of the form ((4)), with several
different robust parameterizations, have been implemented in the
*RobustGaSP* package (see Table 3 for details). The
prior that will be used is controlled by the argument `prior_choice`

in
the `rgasp`

function. The reference prior \(\pi^R(\cdot)\) with
\(\mathbf \gamma\) (the conventional parameterization of the range
parameters for the correlation functions in Table 1) and
\(\mathbf \xi=\log(1/\mathbf \gamma)\) parameterization can be specified
through the arguments `prior_choice="ref_gamma"`

and
`prior_choice="ref_xi"`

, respectively. The jointly robust prior
\(\pi^{JR}(\cdot)\) with the \(\mathbf \beta=1/{ \mathbf\gamma}\)
parameterization can be specified through the argument
`prior_choice="ref_approx"`

; this is the default choice used in `rgasp`

,
for the reasons discussed in Section 2.

The correlation functions implemented in the *RobustGaSP* package are
shown in Table 1, with the default setting being
`kernel_type = "matern_5_2"`

in the `rgasp`

function. The power
exponential correlation function requires the specification of a vector
of roughness parameters \(\mathbf \alpha\) through the argument `alpha`

in
the `rgasp`

function; the default value is \(\alpha_l=1.9\) for
\(l=1,...,p\), as suggested in (Bayarri et al. 2009).

`ppgasp`

functionThe `ppgasp`

function performs the parameter estimation of the PP GaSP
emulator for the computer model with a vector of outputs. In the
`ppgasp`

function, the output \(\mathbf y^{\mathcal D}\) is a \(n \times k\)
matrix, where each row is the \(k\)-dimensional computer model outputs.
The rest of the input quantities of the `ppgasp`

function and `rgasp`

function are the same.

Thus the `ppgasp`

function return the estimated parameters, including
\(k\) estimated variance parameters, and \(q\times k\) mean parameters when
the mean basis has \(q\) dimensions.

Estimation of the range parameters \(\mathbf \gamma\) is implemented
through numerical search for the marginal posterior modes in Equation
((6)). The low-storage quasi-Newton optimization
method ((Nocedal 1980; Liu and Nocedal 1989)) has been used in the
`lbfgs`

function in the
*nloptr* package
((Ypma 2014)) for optimization. The closed-form marginal likelihood,
prior and their derivatives are all coded in C++. The maximum number of
iterations and tolerance bounds are allowed to be chosen by users with
the default setting as `max_eval=30`

and `xtol_rel=1e-5`

, respectively.

Although maximum marginal posterior mode estimation with the robust
parameterization eliminates the problems of the correlation matrix being
estimated to be either \(\mathbf I_n\) or \(\mathbf 1_n \mathbf 1^\top_n\),
the correlation matrix could still be close to these singularities in
some scenarios, particularly when the sample size is very large. In such
cases, we also utilize an upper bound for the range parameters
\(\mathbf \gamma\) (equivalent to a lower bound for
\(\mathbf \beta=1/{\mathbf \gamma}\)). The derivation of this bound is
discussed in the Appendix. This bound is implemented in the `rgasp`

function through the argument `lower_bound=TRUE`

, and this is the
default setting in *RobustGaSP*. As use of the bound is a somewhat ad
hoc fix for numerical problems, we encourage users to also try the
analysis without the bound; this can be done by specifying
`lower_bound=FALSE`

. If the answers are essentially unchanged, one has
more confidence that the parameter estimates are satisfactory.
Furthermore, if the purpose of the analysis is to detect inert inputs,
users are also suggested to use the argument `lower_bound=FALSE`

in the
`rgasp`

function.

Since the marginal posterior distribution could be multi-modal, the
package allows for different initial values in the optimization by
setting the argument `multiple_starts=TRUE`

in the `rgasp`

function. The
first default initial value for each inverse range parameter is set to
be \(50\) times their default lower bounds, so the starting value will not
be too close to the boundary. The second initial value for each of the
inverse range parameter is set to be half of the mean of the jointly
robust prior. Two initial values of the nugget-variance parameter are
set to be \(\eta=0.0001\) and \(\eta=0.0002\) respectively.

In this section, we present further examples of the performance of the
*RobustGaSP* package, and include comparison with the *DiceKriging*
package in R. We will use the same data, trend function, and correlation
function for the comparisons. The default correlation function in both
packages is the Matérn correlation with \(\alpha=5/2\) and the default
trend function is a constant function. The only difference is the method
of parameter estimation, as discussed in Section 2,
where the *DiceKriging* package implements the MLE (by default) and the
penalized MLE (PMLE) methods, (Roustant et al. 2018).

It is expected that, for a one-dimensional function, both packages will perform well with an adequate number of design points, so we start with the function called the modified sine wave discussed in (Gu 2016). It has the form \[y=3\sin(5\pi x)+\cos(7\pi x),\] where \(x=[0,1]\). We first perform emulation based on 12 equally spaced design points on \([0,1]\).

```
> sinewave <- function(x) {
R+ 3*sin(5*pi*x)*x + cos(7*pi*x)
+ }
> input <- as.matrix(seq(0, 1, 1/11))
R> output <- sinewave(input) R
```