The expert: Modeling Without Data Using Expert Opinion’ article from the 2009-1 issue.
The expert package provides tools to create and manipulate empirical statistical models using expert opinion (or judgment). Here, the latter expression refers to a specific body of techniques to elicit the distribution of a random variable when data is scarce or unavailable. Opinions on the quantiles of the distribution are sought from experts in the field and aggregated into a final estimate. The package supports aggregation by means of the Cooke, Mendel–Sheridan and predefined weights models.
We do not mean to give a complete introduction to the theory and practice of expert opinion elicitation in this paper. However, for the sake of completeness and to assist the casual reader, the next section summarizes the main ideas and concepts.
It should be noted that we are only interested, here, in the mathematical techniques of expert opinion aggregation. Obtaining the opinion from the experts is an entirely different task; see (Tversky and D. Kahneman 1974; Kadane and R. L. Winkler 1988; Kadane and L. J. Wolfson; 1998) for more information. Moreover, we do not discuss behavioral models (see Ouchi 2004 for an exhaustive review) nor the problems of expert selection, design and conducting of interviews. We refer the interested reader to (O’Hagan, C. Buck, C. Daneshkhah, J. Eiser, P. Garthwaite, D. Jenkinson, J. Oakley, and T. Rakow 2006) and (Cooke 1991) for details. Although it is extremely important to carefully examine these considerations if expert opinion is to be useful, we assume that these questions have been solved previously. The package takes the opinion of experts as an input that we take here as available.
The other main section presents the features of version 1.0-0 of package expert.
Consider the task of modeling the distribution of a quantity for which data is very scarce (e.g. the cost of a nuclear accident) or even unavailable (e.g. settlements in liability insurance, usually reached out of court and kept secret). A natural option for the analyst 1 is then to ask experts their opinion on the distribution of the so-called . The experts are people who have a good knowledge of the field under study and who are able to express their opinion in a simple probabilistic fashion. They can be engineers, physicists, lawyers, actuaries, etc. Typically, a group of 12 to 15 experts is consulted.
The experts express their opinion on the distribution of the decision random variable by means of a few quantiles, usually the median and a symmetric interval around it. Let us recall that the \(100r{^{th}}\) quantile of a random variable is the value \(q_r\) such that \[\lim_{h \to 0} F(q_r - h) \leq r \leq F(q_r),\] where \(F\) is the cumulative distribution function of the random variable. Hence, by giving the quantiles \(q_u\) and \(q_v\), \(u < v\), an expert states that, in his opinion, there is a probability of \(v - u\) that the true value of the variable lies between \(q_u\) and \(q_v\). We generally avoid asking for quantiles far in the tails (say, below 5% and above 95%) since humans have difficulty evaluating them.
In addition to the distribution of the decision variable, the experts are also queried for the distribution of a series of (or calibration) variables for which only the analyst knows the true values. By comparing the distributions given by the experts to the latter, the analyst will be able to assess the quality of the information provided by each expert. This step is called .
In summary, expert opinion is given by means of quantiles for \(k\) seed variables and one decision variable. The calibration phase determines the influence of each expert on the final . The three methods discussed in this paper and supported by the package are simply different ways to aggregate the information provided by the experts.
The aggregated distribution in the classical model of (Cooke 1991) is a convex combination of the quantiles given by the experts, with weights obtained from the calibration phase. Consequently, if we asked for three quantiles from the experts (say the 10^th, 50^th and 90^th), the aggregated distribution will consist of three “average” quantiles corresponding to these same probabilities. This model has been used in many real life studies by the analysts of the Delft Institute of Applied Mathematics of the Delft University of Technology (Cooke and L. Goossens 2008).
The predefined weights method is similar, except that the weights are provided by the analyst.
On the other hand, the model of (Mendel and T. Sheridan 1989) adjusts the probabilities associated with each quantile by a (somewhat convoluted) Bayesian procedure to reflect the results of the calibration phase. Returning to the context above, this means that the probability of 10% associated with the first quantile given by an expert may be changed to, say, 80% if the calibration phase proved that this expert systematically overestimates the distributions. Combining intersecting adjusted probabilities from all the experts usually results in an aggregated distribution with far more quantiles than were asked for in the beginning.
It is well beyond the scope of this paper to discuss the actual assumptions and computations in the calibration and aggregation phases. The interested reader may consult the aforementioned references, (Goulet, M. Jacques, and M. Pigeon 2008) or Pigeon (2008, in French).
To the best of our knowledge, the only readily available software for expert opinion calculations is Excalibur, a closed source, Windows-only application available from the Risk and Environmental Modelling website of the Delft University of Technology (http://dutiosc.twi.tudelft.nl/~risk/). Excalibur does not support the Mendel–Sheridan model. Data has to be entered manually in the application, making connection to other tools inconvenient. Furthermore, exporting the results for further analysis requires an ad hoc procedure.
expert is a small R package providing a unified interface to the three expert opinion models exposed above along with a few utility functions and methods to manipulate the results. Naturally, the analyst also benefits from R’s rich array of statistical, graphical, import and export tools.
Before looking at the functions of the package, we introduce a dataset that will be used in forthcoming examples. We consider an analysis where three experts must express their opinion on the 10^th, 50^th and 90^th quantiles of a decision variable and two seed variables. The true values of the seed variables are \(0.27\) and \(210,000\), respectively. See Table 1 for the quantiles given by the experts.
Variable | ||||
---|---|---|---|---|
Expert | Quantile | Seed 1 | Seed 2 | Decision |
1 | \(10^{th}\) | \(0.14\) | \(130,000\) | \(350,000\) |
\(50^{th}\) | \(0.22\) | \(150,000\) | \(400,000\) | |
\(90^{th}\) | \(0.28\) | \(200,000\) | \(525,000\) | |
2 | \(10^{th}\) | \(0.20\) | \(165,000\) | \(550,000\) |
\(50^{th}\) | \(0.30\) | \(205,000\) | \(600,000\) | |
\(90^{th}\) | \(0.40\) | \(250,000\) | \(650,000\) | |
3 | \(10^{th}\) | \(0.20\) | \(200,000\) | \(625,000\) |
\(50^{th}\) | \(0.40\) | \(400,000\) | \(700,000\) | |
\(90^{th}\) | \(0.52\) | \(500,000\) | \(800,000\) |
The main function of the package is expert
. It serves as a unified
interface to all three models supported by the package. The arguments of
the function are the following:
x
, the quantiles of the experts for all seed variables and the
decision variable — in other words the content of
Table 1. This is given as a list of lists, one for
each expert. The interior lists each contain vectors of quantiles:
one per seed variable and one for the decision variable (in this
order). For the example above, this gives
> x <- list(
+ E1 <- list(
+ S1 <- c(0.14, 0.22, 0.28),
+ S2 <- c(130000, 150000, 200000),
+ X <- c(350000, 400000, 525000)),
+ E2 <- list(
+ S1 <- c(0.2, 0.3, 0.4),
+ S2 <- c(165000, 205000, 250000),
+ X <- c(550000, 600000, 650000)),
+ E3 <- list(
+ S1 <- c(0.2, 0.4, 0.52),
+ S2 <- c(200000, 400000, 500000),
+ X <- c(625000, 700000, 800000)))
method
, the model to be used for aggregation. Must be one of for
Cooke’s classical model, for the Mendel–Sheridan model, or for the
predefined weights model;
probs
, the vector of probabilities corresponding to the quantiles
given by the experts. In the example, we have
> probs <- c(0.1, 0.5, 0.9)
true.seed
, the vector of true values of the seed variables.
Obviously, these must be listed in the same order as the seed
variables are listed in argument x
. In the example, this is
> true.seed <- c(0.27, 210000)
alpha
, parameter \(\alpha\) in Cooke’s model. This argument is
ignored in the other models. Parameter \(\alpha\) sets a threshold for
the calibration component of an expert below which the expert
automatically receives a weight of \(0\) in the aggregated
distribution. If the argument is missing or NULL
, the value of
\(\alpha\) is determined by an optimization procedure (see below);
w
, the vector of weights in the predefined weights model. This
argument is ignored in the other models. If the argument is missing
or NULL
, the weights are all equal to \(1/n\), where \(n\) is the
number of experts.
If the calibration threshold \(\alpha\) in Cooke’s model is not fixed by the analyst, one is computed following a procedure proposed by (Cooke 1991). We first fit the model with \(\alpha = 0\). This gives a weight to each expert. Using these weights, we then create a “virtual” expert by aggregating the quantiles for the seed variables. The optimal \(\alpha\) is the value yielding the largest weight for the virtual expert. This is determined by trying all values of \(\alpha\) just below the calibration components obtained in the first step. There is no need to try other values and, hence, to conduct a systematic numerical optimization.
When the experts do not provide the \(0^{th}\) and \(100^{th}\) quantiles,
expert
sets them automatically following the ad hoc procedure exposed
in (Cooke 1991). In a nutshell, the quantiles are obtained by
removing and adding 10% of the smallest interval containing all
quantiles given by the experts to the bounds of this interval.
Therefore, the minimum and maximum of the distribution are set the same
for all experts. In our example, the \(0^{th}\) and \(100^{th}\) quantiles
of the first seed variable are set, respectively, to
\[\begin{aligned}
q_{0} &= 0.14 - 0.1 (0.52 - 0.14) = 0.102 \\
q_{1} &= 0.52 + 0.1 (0.52 - 0.14) = 0.558,
\end{aligned}\] those of the second seed variable to
\[\begin{aligned}
q_{0} &= 130,000 - 0.1 (500,000 - 130,000) = 93,000 \\
q_{1} &= 500,000 + 0.1 (500,000 - 130,000) = 537,000
\end{aligned}\] and, finally, those of the decision variable to
\[\begin{aligned}
q_{0} &= 350,000 - 0.1 (800,000 - 350,000) = 305,000 \\
q_{1} &= 800,000 + 0.1 (800,000 - 350,000) = 845,000.
\end{aligned}\] Note that only the Mendel–Sheridan model allows
non-finite lower and upper bounds.
The function expert
returns a list of class containing the vector of
the knots (or bounds of the intervals) of the aggregated distribution,
the vector of corresponding probability masses and some other
characteristics of the model used. A print
method displays the results
neatly.
Consider using Cooke’s model with the data of the example and a threshold of \(\alpha = 0.3\). We obtain:
> expert(x, "cooke", probs, true.seed,
+ alpha = 0.03)
Aggregated Distribution Using Cooke Model
Interval Probability305000, 512931] 0.1
(512931, 563423] 0.4
(563423, 628864] 0.4
(628864, 845000] 0.1
(
: 0.03 Alpha
As previously explained, if the value of \(\alpha\) is not specified in
the call to expert
, its value is obtained by optimization and the
resulting aggregated distribution is different:
> expert(x, "cooke", probs, true.seed)
Aggregated Distribution Using Cooke Model
Interval Probability305000, 550000] 0.1
(550000, 600000] 0.4
(600000, 650000] 0.4
(650000, 845000] 0.1
(
: 0.3448 Alpha
The Mendel–Sheridan model yields an entirely different result that
retains many of the quantiles given by the experts. The result is a more
detailed distribution. Here, we store the result in an object fit
for
later use:
> fit <- expert(x, "ms", probs, true.seed)
> fit
Aggregated Distribution Using-Sheridan Model
Mendel
Interval Probability305000, 350000] 0.01726
(350000, 400000] 0.06864
(400000, 525000] 0.06864
(525000, 550000] 0.01726
(550000, 600000] 0.06864
(600000, 625000] 0.06864
(625000, 650000] 0.53636
(650000, 700000] 0.06864
(700000, 800000] 0.06864
(800000, 845000] 0.01726 (
In addition to the main function expert
, the package features a few
functions and methods to ease manipulation and analysis of the
aggregated distribution. Most are inspired by similar functions in
package actuar (Dutang, V. Goulet, and M. Pigeon 2008).
First, a summary
method for objects of class provides more details on
the fitted model than the print
method, namely the number of experts,
the number of seed variables and the requested and set quantiles:
> summary(fit)
:
Callexpert(x = x, method = "ms", probs = probs,
true.seed = true.seed)
Aggregated Distribution Using-Sheridan Model
Mendel
Interval Probability305000, 350000] 0.01726
(350000, 400000] 0.06864
(400000, 525000] 0.06864
(525000, 550000] 0.01726
(550000, 600000] 0.06864
(600000, 625000] 0.06864
(625000, 650000] 0.53636
(650000, 700000] 0.06864
(700000, 800000] 0.06864
(800000, 845000] 0.01726
(
: 3,
Number of experts: 2
Number of seed variables: 0*, 0.1, 0.5, 0.9, 1*
Quantiles* were set) (
Methods of mean
and quantile
give easy access to the mean and to the
quantiles of the aggregated distribution:
> mean(fit)
1] 607875 [
> quantile(fit)
0% 25% 50% 75% 100%
305000 600000 625000 625000 845000
Of course, one may also specify one or many quantiles in a second argument:
> quantile(fit, c(0.1, 0.9))
10% 90%
400000 650000
Moreover, the quantile
method can return linearly interpolated
quantiles by setting the smooth
argument to TRUE
:
> quantile(fit, smooth = TRUE)
0% 25% 50% 75% 100%
305000 603478 633898 645551 845000
The inverse of the quantile and smoothed quantile functions are,
respectively, the cumulative distribution function (cdf) and the ogive
(Klugman, H. H. Panjer, and G. Willmot 1998; Goulet and M. Pigeon 2008). In a fashion similar
to ecdf
of the base R package
stats, the cdf
function
returns a function object to compute the cdf of the aggregated
distribution at any point:
> F <- cdf(fit)
> knots(F)
1] 305000 350000 400000 525000 550000
[6] 600000 625000 650000 700000 800000
[11] 845000 [
> F(knots(F))
1] 0.00000 0.01726 0.08590 0.15455
[5] 0.17181 0.24045 0.30909 0.84545
[9] 0.91410 0.98274 1.00000 [
> F(c(450000, 750000))
1] 0.0859 0.9141 [
A plot
method yields the graphic in Figure 1:
> plot(F)
Function ogive
is the analogue of cdf
for the linearly interpolated
cdf:
> G <- ogive(fit)
> G(knots(G))
1] 0.00000 0.01726 0.08590 0.15455
[5] 0.17181 0.24045 0.30909 0.84545
[9] 0.91410 0.98274 1.00000 [
> G(c(450000, 750000))
1] 0.1134 0.9484 [
> plot(G)
See Figure 2 for the graphic created with the last expression.
Finally, a method of hist
draws the derivative of the ogive, that is
the histogram. The arguments of the method are the same as those of the
default method in package
stats. See Figure
3 for the graphic created by the following expression:
> hist(fit)
Depending on the application, the aggregated distribution one obtains
from expert
might be too crude for risk analysis, especially if one
used Cooke’s model with very few quantiles. Now, the expert aggregated
distribution actually plays the role of the empirical distribution in
usual statistical modeling. That means one can further fit a continuous
or discrete distribution to the aggregated “data”.
For example, consider the task of fitting a gamma distribution to the
Mendel–Sheridan aggregated distribution obtained above. Among many
other options (see, e.g., Klugman, H. H. Panjer, and G. Willmot 1998; Goulet and M. Pigeon 2008), we choose our
parameters to minimize the Cramér-von Mises distance
\[d(\alpha, \lambda) =
\sum_{i = 1}^n (G(x_i) - F(x_i; \alpha, \lambda))^2,\] where
\(G(\cdot)\) is the ogive of the aggregated distribution, \(x_1, \dots, x_n\) are its knots and \(F(\cdot; \alpha, \lambda)\) is the cdf of
a gamma distribution with shape parameter \(\alpha\) and rate parameter
\(\lambda\). The optimal parameters are simple to obtain with the function
optim
:
> f <- function(p, x) drop(crossprod(G(x) -
+ pgamma(x, p[1], p[2])))
> optim(c(100, 0.00016), f,
x = knots(G))$par
1] 4.233e+02 6.716e-04 [
The paper introduced a package to elicit statistical distributions from
expert opinion in situations where real data is scarce or absent. The
main function, expert
, is a unified interface to the two most
important models in the field — the Cooke classical model and the
Mendel–Sheridan Bayesian model — as well as the informal predefined
weights model. The package also provides functions and methods to
compute from the object returned by expert
and plot important results.
Most notably, cdf
, ogive
and quantile
give access to the
cumulative distribution function of the aggregated distribution, its
ogive and the inverse of the cdf and ogive.
For modeling purposes, the package can be used as a direct replacement of Excalibur. However, coupled with the outstanding statistical and graphical capabilities of R, we feel expert goes far beyond Excalibur in terms of usability and interoperability.
This research benefited from financial support from the Natural Sciences and Engineering Research Council of Canada and from the Chaire d’actuariat (Actuarial Science Chair) of Université Laval. We also thank one anonymous referee and the R Journal editors for corrections and improvements to the paper.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Goulet, et al., "expert: Modeling Without Data Using Expert Opinion", The R Journal, 2009
BibTeX citation
@article{RJ-2009-005, author = {Goulet, Vincent and Jacques, Michel and Pigeon, Mathieu}, title = {expert: Modeling Without Data Using Expert Opinion}, journal = {The R Journal}, year = {2009}, note = {https://rjournal.github.io/}, volume = {1}, issue = {1}, issn = {2073-4859}, pages = {31-36} }