The OneStep package proposes principally an eponymic function that numerically computes Le Cam’s one-step estimator, which is asymptotically efficient and can be computed faster than the maximum likelihood estimator for large datasets. Monte Carlo simulations are carried out for several examples (discrete and continuous probability distributions) in order to exhibit the performance of Le Cam’s one-step estimation procedure in terms of efficiency and computational cost on observation samples of finite size.
In the statistical experiments generated by i.i.d. observation samples, the sequence of maximum likelihood estimators (MLE) is known to be asymptotically efficient under very general assumptions and consequently presents the fastest convergence rate and the lowest possible asymptotic variance.
Although the sequence of MLE is asymptotically efficient, it is generally not expressed in a closed form and requires time consuming numerical computations. On the other hand, the other generic estimation procedures which can sometimes be computed faster, as the method of moments, do not generally reach the optimal asymptotic variance.
In R, the package fitdistrplus (see (Delignette-Muller and Dutang 2015)) is commonly
used to infer the parameters of univariate probability distributions.
For non-censored datasets, fitdistrplus allows four different
estimation methods: maximum likelihood, moment matching, quantile
matching, and maximum goodness-of-fit estimation. The unified approach
is provided by fitdist()
which returns an S3-object having usual
generic functions (plot
, summary
, logLik
, …) as well as
dedicated fitdist
-functions (gofstat
, bootdist
, …).
Other packages with similar purposes are EstimationTools and DistributionFitR, providing MLE for non-censored data. Many other packages also provide estimation procedure on the basis of one function per probability distribution; see, e.g., packages univariateML, propagate. A final package, which is worth mentioning, is fitter, which fits a set of probability distributions on a given dataset sequentially.
However, as soon as the Fisher information matrix is sufficiently regular with respect to the parameter to be estimated, Le Cam’s one-step estimation procedures can be achieved (see (Le Cam 1956)). They are based on an initial sequence of guess estimators and a single Newton step or a Fisher scoring step on the loglikelihood function. Namely, they write \[\label{LC1} \overline{\vartheta}_n = \vartheta^*_n + {\cal I}(\vartheta^*_n)^{-1}\cdot \frac1n \sum_{j=1}^n \dot{\ell} (\vartheta^*_n,X_j), \quad n \geq 1, \tag{1}\] for the Fisher scoring type procedure, where \((\vartheta^*_n, n \geq 1)\) is the initial sequence of guess estimators, \(\ell(\vartheta)\) is the loglikelihood function, the dot is the notation for the gradient with respect to \(\vartheta\), and \({\cal I}(\vartheta)=-\mathbf{E}_\vartheta( \ddot{\ell} (\vartheta) )\) is the Fisher information matrix and \[\label{LC2} \overline{\vartheta}_n = \vartheta^*_n + \widehat{{\cal I}}_n(\vartheta^*_n)^{-1}\cdot \frac1n \sum_{j=1}^n \dot{\ell} (\vartheta^*_n,X_j), \quad n \geq 1, \tag{2}\] for the Newton type procedure, where \(\widehat{{\cal I}}_n(\vartheta)= -\frac{1}{n}\sum_{j=1}^n \ddot{\ell} (\vartheta,X_j)\) is the opposite of the Hessian for the loglikelihood function.
The sequence of Le Cam’s one-step estimators presents certain advantages over the sequence of MLE and over the initial sequence of estimators (method of moments, quantile matching method, etc.) in terms of computational cost and asymptotic variance. It is much less computationally expensive than the MLE, while it has the same rate and the same asymptotic variance. Since there is no full numerical optimization (but only one computation of the Newton step or the Fisher scoring step), the procedure is faster and appropriate for very large datasets. On the other hand, it is asymptotically optimal in terms of asymptotic variance, which is generally not the case for the initial sequence of guess estimators.
For several probability distributions (Gaussian, Exponential, Lognormal, Poisson, Geometric), a fast computable sequence of estimators is already asymptotically efficient, and no correction is needed. When the Fisher information is given in a closed form (Gamma, Beta, \(\chi^2\), Weibull, Pareto II, Cauchy), Le Cam’s one-step estimation procedure (1) is executed. In all the other cases, the estimation procedure (2) is used either with the score function and the Hessian in a closed form (Negative Binomial) or with the numerical approximation of the score function and the Hessian with the package numDeriv (see (Gilbert and Varadhan 2019)).
Le Cam’s one-step estimation procedures are implemented in the package OneStep, and the eponymic function onestep is described in this paper and used on different examples. Monte Carlo simulations are performed for several examples in order to exhibit the performance of Le Cam’s one-step procedure on samples of finite size. They exhibit their asymptotic efficiency simultaneously with their better computational cost for several probability distributions.
onestep
Let \((X_1,X_2, \ldots, X_n)\) be a sample of i.i.d. random variables. The probability density function of \(X_1\) is denoted \(f\) and depends on an unknown parameter \(\vartheta \in \Theta \subset \mathbb{R}^p\) which is to be estimated.
Le Cam’s one-step estimation procedure is based on an initial sequence of guess estimators and a Fisher scoring step or a single Newton step on the loglikelihood function. For the novice user, the function onestep automatically chooses the best procedure to be used. There is, consecutively, a single command for the user, which is
onestep(data, distr)
The function onestep presents several procedures internally depending on whether the initial sequence of guess estimators is in a closed-form or not and whether the score and the Fisher information matrix can be elicited in a closed-form.
Here is the list of the procedures taken into account in the onestep function:
Distributions for which the MLE is already explicit: norm
, exp
,
lnorm
, invgauss
for continuous probability distributions, and
pois
, geom
for discrete probability distributions. For this
class, the explicit MLE is returned.
Distributions for which the initial sequence of guess estimators,
the score and the Fisher information matrix have been elicited in a
closed-form: gamma
, beta
, chisq
with an initial sequence of
moment estimators, cauchy
with an initial sequence of quantile
matching estimators, and weibull
with an initial sequence of
graphical plot estimators. For this class, Le Cam’s one-step
procedure (1) is applied.
Distributions for which the initial sequence of guess estimators,
the score, and the Hessian have been elicited in a closed-form:
nbinom
. For this class, Le Cam’s one-step procedure (2)
is applied.
Distributions for which the initial sequence is numerically computed
on a subsample, but the score and the Fisher information matrix have
closed-form: pareto
. For this class, Le Cam’s one-step procedure
(1) is executed.
For all other distributions, if the density function is well defined, the numerical computation of the Newton step in Le Cam’s one-step procedure (2) is proposed with an initial sequence of guess estimators, which is the sequence of maximum likelihood estimators computed on a subsample.
As is described, all explicit distributions of the mmedist
function
from fitdistrplus have been corrected in the onestep function,
except unif
and logis
. The example unif
is a famous example of the
singular behavior of the MLE. The correction of logis
is of very small
gain from the method of moments estimator to the MLE. For these two
distributions, the method of moments is returned.
However, the package also offers several new explicit computations as
cauchy
, chisq
and weibull
and applies to distributions coming from
the actuar package (see (Dutang et al. 2008)) such as invgauss
and pareto
.
The function onestep
allows the user to propose its own initial guess
estimation in the one-step procedure by specifying the parameter init
in the command
onestep(data, distr, init)
The user can consequently use different initial guess estimators for the
aforementioned classical distributions (moment estimators, estimators
based on the characteristic function, quantile matching estimators,
Bayesian estimators, mode-type estimators, graphical methods, etc.).
Several examples can be found in the documentation of the onestep
function.
Monte Carlo simulations are done for several examples (discrete and continuous probability distributions) in order to exhibit the performance of Le Cam’s one-step estimation procedure in terms of efficiency and computational cost on observation samples of finite size.
For the assessment of the efficiency, the proximity of the renormalized
statistical errors \((Y_1,Y_2,\ldots, Y_M)\) to the centered Gaussian
asymptotic distribution (for a coordinate in the multivariate setting)
is evaluated with the Cramer-Von Mises statistic
\[T= M \omega^2_M = M \int_\mathbb{R} \left(F_M(y)-F_*(y) \right)^2 dF_*(y) = \frac{1}{12M} + \sum_{i=1}^M \left(F_*(Y_{(i)})- \frac{2i-1}{2M}\right)^2,\]
where the order statistics are denoted
\(Y_{(1)} \leq Y_{(2)} \leq \dots \leq Y_{(M)}\), \(F_*(\cdot)\) is the
theoretical Gaussian asymptotic cumulative distribution function and
\[F_M(y)=\frac{1}{M}\sum_{i=1}^M {\mathbf{\mathbb{1}}}_{\{ Y_i \leq y \}}\]
is the empirical cumulative distribution function. The asymptotic
distribution of \(T\) is tabulated, for instance, in the goftest
package. Note that a value of the statistic below 0.7434 corresponds to
accept the null (and the equivalence to theoretical Gaussian asymptotic
distribution) with an error of type I equal to 1%.
Timing performance (given in seconds) is done with the proc.time
function ("elapsed"
time) on a laptop with an Intel Core i7 2.7 GHz
processor with 8GB RAM. The onestep
function is compared to the MLE
computed with mledist
of the fitdistrplus package and to the
sequence of initial guess estimators (for instance, the moment estimator
is computed with mmedist
for the gamma
, beta
, and nbinom
distributions).
The functions benchonestep
and benchonestep.replicate
were used to
compare the performance of estimators (see documentation of the
OneStep package).
For the majority of the "closed formula" cases, the initial sequence of guess estimators of the unknown parameter \(\vartheta\) is the sequence of moment estimators.
Let \((X_1,X_2, \ldots, X_n)\) be a sample of i.i.d. random variables. Let us denote the theoretical moments \(m_k(\vartheta)=\mathbf{E}_\vartheta (X_1^k)\) and the empirical moments \(\tilde{m}_k=\frac{1}{n} \sum_{j=1}^n X_j^k\). The sequence of moment estimators (ME) is generally defined as the solution of the system of equations \[m_k(\vartheta_n^*) =\tilde{m}_k , \quad k=1,\ldots,p.\] Indeed, under very mild conditions, the sequence of moment estimators \((\vartheta_n^*, n\geq 1)\) is asymptotically normal, and therefore, \(\sqrt{n}\)-consistent (see (Ibragimov and Has’minskii 1981)). Namely, \[\label{eq:normME} \sqrt{n} \left( \vartheta_n^* - \vartheta \right) \longrightarrow {\cal N}\left(0, J^{-1}(\vartheta) A(\vartheta) J^{-T}(\vartheta)\right) \tag{3}\] in law as \(n\rightarrow \infty\), where \[J(\vartheta)= \left( \frac{\partial}{\partial \vartheta_j} m_i(\vartheta) \right)_{1\leq i\leq p, 1 \leq j \leq p}\] and \[A(\vartheta)=\mathbf{E}_\vartheta \left( (X_1^i - m_i(\vartheta))(X_1^j - m_j(\vartheta)) \right)_{1\leq i\leq p, 1 \leq j \leq p}.\] Here, the notation \(M^T\) means the transpose matrix of \(M\). Several examples (Gamma, Beta, and Negative Binomial) are given later on in this section.
However, other \(\sqrt{n}\)-consistent sequences of initial guess estimators can be used. For instance, an initial sequence of quantile matching estimators is employed for the Cauchy distribution.
It had been shown in (Le Cam 1956) that for a \(\sqrt{n}\)–consistent initial sequence of guess estimators and an uniformly continuous Fisher information matrix, the sequence of Le Cam’s one-step estimators defined in (1) or (2) is consistent, asymptotically normal and efficient (in the Fisher sense) with \[\sqrt{n}\left( \widehat{\vartheta}_n-\vartheta \right) \rightarrow {\cal N}\left(0,{\cal I}(\vartheta)^{-1}\right).\] In other words, for an initial sequence which is asymptotically rate but not variance efficient, the new sequence is asymptotically rate and variance efficient.
The first example is the joint estimation of the shape parameter \(\alpha\) and scale parameter \(\beta\) in the statistical experiment generated by a sample \((X_1,X_2,\ldots,X_n)\) of i.i.d. Gamma random variables whose probability density function is given by \[f(x)= \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1} \exp( - \beta x ), \quad x >0.\]
Let us denote \(\vartheta=(\alpha,\beta)\). In this statistical experiment, the sequence of maximum likelihood estimators \((\widehat{\vartheta}_n, n \geq 1)\) of \(\vartheta\) is not in a closed-form. The sequence of MLE satisfies \[\sqrt{n}\left( \widehat{\vartheta}_n-\vartheta \right) \rightarrow {\cal N}\left(0,{\cal I}(\vartheta)^{-1}\right),\] where \[{\cal I}(\vartheta)=\begin{pmatrix} \psi^{(2)}(\alpha) & -\frac{1}{\beta} \\ - \frac{1}{\beta} & \frac{\alpha}{\beta}\end{pmatrix}.\] Here, \(\psi^{(n)}\) is the polygamma functions (see (Abramowitz and Stegun 1972 6.4.1, page 260)) defined by \(\psi^{(n)}(\alpha)=\frac{\partial^n}{\partial \alpha^n} \log \Gamma(\alpha)\). Consider that \[m_k(\vartheta) = \frac{\alpha(\alpha+1) \ldots (\alpha+k-1)}{\beta^n}.\] Consecultively, the sequence of moment estimators \((\vartheta^*_n=(\alpha^*_n,\beta^*_n), n \geq1)\) given by \[\alpha^*_n=\frac{\tilde{m}_1^2}{\tilde{m}_2-\tilde{m}_1^2} \quad and \quad \beta^*_n=\frac{\tilde{m}_1}{\tilde{m}_2-\tilde{m}_1^2}\] is asymptotically normal (see (3)) with \[\quad J=\begin{pmatrix} \frac{1}{\beta} & -\frac{\alpha}{\beta^2} \\ \frac{1+2\alpha}{\beta^2} & -\frac{2 \alpha(\alpha +1)}{\beta^3} \end{pmatrix} \quad and \quad A(\vartheta)=\begin{pmatrix} \frac{\alpha}{\beta^2} & \frac{2 \alpha(\alpha +1)}{\beta^3} \\ \frac{2 \alpha(\alpha +1)}{\beta^3} & \frac{\alpha (4\alpha+6) (\alpha+1)}{\beta^4}\end{pmatrix},\] and consequently does not reach asymptotical efficiency.
We can see in Figure 1 (and in Table 1 with the Cramer-Von Mises statistics) that the sequence of Le Cam’s one-step estimators (LCE) reaches efficiency and naturally overperforms the initial sequence of ME in terms of asymptotic variance. Moreover, this sequence is faster to be computed on \(M=10000\) Monte Carlo simulations than the sequence of MLE, as shown in Table 1, displaying total computation times over \(M\).
MLE | ME | LCE | |
---|---|---|---|
Computation time (s) | 1559.74 | 29.72 | 37.46 |
CvM statistic alpha | 0.63 | 1.04 | 0.23 |
CvM statistic beta | 0.47 | 0.86 | 0.19 |
As it was mentioned in the introduction, the major advantage of the sequence of one-step estimators is that it is computed faster than the maximum likelihood estimator for large datasets. We illustrate this fact in Table 2, where the average computation times (over 10 Monte Carlo simulations) are done for different sample sizes \(n=10^r\), \(r=3,\ldots,9\) for both MLE and LCE. LCE is between 20 times and 55 times faster than MLE, especially for large sample sizes when there is memory overload.
10^ 3 |
10^ 4 |
10^ 5 |
10^ 6 |
10^ 7 |
10^ 8 |
10^ 9 |
|
---|---|---|---|---|---|---|---|
MLE | 0.0121 | 0.1193 | 1.0056 | 10.4735 | 99.4203 | 2339.0437 | 31600.8313 |
LCE | 0.0007 | 0.0053 | 0.0252 | 0.2361 | 2.3152 | 40.0440 | 540.7393 |
The second example is the joint estimation of the first shape parameter \(\alpha\) and the second shape parameter \(\beta\) in the statistical experiment generated by a sample \((X_1,X_2,\ldots,X_n)\) of i.i.d. Beta random variables whose probability density function is given by \[f(x)= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1} (1-x)^{\beta-1}, \quad x \in [0,1].\]
Let us denote \(\vartheta=(\alpha,\beta)\). In this statistical experiment, the sequence of maximum likelihood estimators \((\widehat{\vartheta}_n, n \geq 1)\) of \(\vartheta\) is not in a closed-form. The sequence of MLE satisfies \[\sqrt{n}\left( \widehat{\vartheta}_n-\vartheta \right) \rightarrow {\cal N}\left(0,{\cal I}(\vartheta)^{-1}\right),\] where \[{\cal I}(\vartheta)=\begin{pmatrix} \psi^{(2)}(\alpha) - \psi^{(2)}(\alpha+\beta)& -\psi^{(2)}(\alpha+\beta) \\ - \psi^{(2)}(\alpha+\beta) & \psi^{(2)}(\beta) - \psi^{(2)}(\alpha+\beta)\end{pmatrix}.\]
It is worth mentioning that for the Beta distribution \[m_k(\vartheta) = \frac{\Gamma(\alpha+k)\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\alpha+\beta+k)}\] that allows one to build closed-form moment estimators. Namely, denoting \(\tilde{w}_n= \frac{\tilde{m}_1(1-\tilde{m}_1)}{\tilde{m}_2-\tilde{m}_1^2} -1\), we obtain \[\alpha^*_n= \tilde{m}_1 \tilde{w}_n \quad and \quad \beta^*_n= (1-\tilde{m}_1 ) \tilde{w}_n.\] Asymptotic variance in (3) of the sequence of moment estimators \((\vartheta^*_n=(\alpha^*_n,\beta^*_n), n \geq1)\) can also be computed with \[\quad J=\begin{pmatrix} \frac{\beta}{(\alpha+\beta)^2} & -\frac{\alpha}{(\alpha+\beta)^2} \\ \frac{2\alpha^2\beta+2\alpha \beta^2 + 2\alpha \beta +\beta^2 +\beta}{(\alpha+\beta)^2(\alpha+\beta+1)^2} & - \frac{ \alpha(\alpha +1)(2\alpha+2\beta+1)}{(\alpha+\beta)^2(\alpha+\beta+1)^2} \end{pmatrix}\] and \[A(\vartheta)=\begin{pmatrix} \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} & \frac{2 \alpha \beta(\alpha +1)}{(\alpha+\beta)^2(\alpha+\beta+1)(\alpha+\beta+2)} \\ \frac{2 \alpha \beta(\alpha +1)}{(\alpha+\beta)^2(\alpha+\beta+1)(\alpha+\beta+2)} & \frac{\alpha(\alpha+1)(2\alpha^3+6\alpha^2\beta+4\alpha\beta^2+14\alpha\beta+4 \alpha^2+4\alpha+6\beta^2+6\beta)}{(\alpha+\beta)^2(\alpha+\beta+1)(\alpha+\beta+2)(\alpha+\beta+3)}\end{pmatrix},\] and consequently the sequence does not reach asymptotical efficiency.
Here again, the sequence of Le Cam’s one-step estimators naturally overperforms the initial sequence of ME in terms of asymptotic variance (see Figure 2 and the next Table for CvM statistics).
It is computed faster than the sequence of MLE, as shown in Table 3.
MLE | ME | LCE | |
---|---|---|---|
Computation time (s) | 2324.99 | 43.26 | 52.02 |
CvM statistic alpha | 0.22 | 0.06 | 0.05 |
CvM statistic beta | 0.34 | 0.11 | 0.20 |
The third example is the joint estimation of the location parameter \(m\in\mathbb{R}\) and the scale parameter \(d>0\) in the statistical experiment generated by a sample \((X_1,X_2,\ldots,X_n)\) of i.i.d. Cauchy random variables whose probability density function is given by \[\label{cauchy} f(x) = \frac{d}{\pi \left(d^2+\left(x-m\right)^2\right)}, \quad x \in \mathbb{R}. \tag{4}\] Let us denote \(\vartheta=(m,d)\). In this statistical experiment, the sequence of maximum likelihood estimators \((\widehat{\vartheta}_n, n \geq 1)\) of \(\vartheta\) is not in a closed-form. The sequence of MLE satisfies \[\sqrt{n}\left( \widehat{\vartheta}_n-\vartheta \right) \rightarrow {\cal N}\left(0,{\cal I}(\vartheta)^{-1}\right),\] where \[{\cal I}(\vartheta)=\begin{pmatrix}\frac{1}{2d^2} & 0 \\ 0& \frac{1}{2d^2}\end{pmatrix}.\]
It is worth mentioning that the Cauchy distribution has no first moment, and consecutively its unknown parameter \(\vartheta\) cannot be estimated via the classical method of moments. But a quantile matching method allows one to define \[m^*_{n} = Q_n \left(\frac{1}{2}\right) \quad and \quad d^*_n= \frac12 \left( Q_n \left(\frac{3}{4}\right) -Q_n \left(\frac{1}{4}\right) \right),\] where the sample quantile is usually computed for \(p\in(0,1)\) as \[Q_n(p) = X_{(\lfloor np\rfloor)},\] with the order statistics denoted \(X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(n)}\). This sequence of estimators \((\vartheta^*_n=(m^*_n,d^*_n), n \geq1)\) can be shown to be consistent and asymptotically normal, namely \[\sqrt{n}\left(\vartheta^*_n-\vartheta\right) {\longrightarrow} {\cal N} \left( 0, \Gamma_1\right),\] where \[\label{cauchy1} \Gamma_1= \begin{pmatrix} \frac{1}{4 f(m)^2} & 0 \\ 0 & \frac{1}{16 f(q_1)^2} \end{pmatrix}. \tag{5}\] Here \(q_1\) is the theoretical first quartile. Consequently, this estimator does not reach the efficient asymptotic variance.
As mentioned previously, the sequence of Le Cam’s one-step estimators overperforms the initial sequence of quantile matching estimators (QME) in terms of asymptotic variance (see Figure 3 and the next Table for CvM statistics).
It is computed faster than the sequence of MLE, as shown in Table 4.
MLE | QME | LCE | |
---|---|---|---|
Computation time (s) | 225.32 | 7.40 | 27.59 |
CvM statistic m | 0.08 | 0.42 | 0.08 |
CvM statistic d | 0.04 | 0.10 | 0.04 |
The fourth example is the joint estimation of the size parameter \(r\) and the mean parameter \(\mu\) in the statistical experiment generated by a sample \((X_1,X_2,\ldots,X_n)\) of i.i.d. negative binomial random variables whose (discrete) probability density function is given by \[f(x)= \frac{\Gamma(r+x)}{\Gamma(r)x!} \left( \frac{r}{\mu+r} \right)^{r} \left(\frac{\mu}{\mu+r} \right)^{x}, \quad x \in \mathbb{N}.\] Let us denote \(\vartheta=(r,\mu)\). In this statistical experiment, the sequence of maximum likelihood estimators \((\widehat{\vartheta}_n, n \geq 1)\) of \(\vartheta\) is not in a closed-form and the Fisher information matrix neither.
For this distribution, \[m_1(\vartheta)= \mu \quad and \quad m_2(\vartheta) = \mu^2 +\mu + \frac{\mu^2}{r}\] that gives closed-form sequence of moment estimators \((\vartheta^*_n=(r^*_n,\mu^*_n), n \geq1)\) given by \[r_n^*= \frac{\tilde{m}_1^2}{(\tilde{m}_2-\tilde{m}_1^2)-\tilde{m}_1} \quad and \quad \mu_n^*=\tilde{m}_1.\]
In this discrete case, the sequence of Le Cam’s one-step estimators (2) defined with the Hessian still overperforms the initial sequence of ME in terms of asymptotic variance (see Figure 4 and the next Table for CvM statistics). Let us mention that the CvM statistics are computed with the estimated variances for MLE and LCE.
It also overperforms the sequence of MLE in terms of computation time, as shown in Table 5.
MLE | ME | LCE | |
---|---|---|---|
Computation time (s) | 2038.33 | 39.59 | 104.75 |
CvM statistic r | 0.18 | 0.71 | 0.19 |
CvM statistic mu | 0.05 | 0.05 | 0.05 |
The fifth example is the joint estimation of the shape parameter \(\tau>0\) and the rate parameter \(\beta>0\) in the statistical experiment generated by a sample \((X_1,X_2,\ldots,X_n)\) of i.i.d. Weibull random variables whose probability density function is given by \[f(x)= \beta \tau (\beta x)^{\tau-1} \exp \left( - (\beta x)^\tau \right), \quad x \in \mathbb{R}^+_*.\] Let us denote \(\vartheta=(\tau,\beta)\). In this example, neither the ME nor the MLE is in closed-form. However, the score and the Fisher information matrix can be explicitly computed. For instance, \[{\cal I}(\vartheta) = \begin{pmatrix} \frac{1}{\tau^2} \left( \psi^{(2)}(1)+ [\psi^{(1)}(2)]^2 \right) & \frac{1}{\beta} \psi^{(1)}(2) \\ \frac{1}{\beta} \psi^{(1)}(2) & \frac{\tau^2}{\beta^2}\end{pmatrix}.\]
Moreover, it is possible to define a closed-form initial sequence of graphical plot estimators (based on the explicit form of the cumulative distribution function). Let us denote \(x_i=\log(-\log(1-i/(n+1)))\) and \(Z_i=\log\left(X_{(i)}\right)\) for \(i=1,\ldots,n\). The sequence of ordinary least square (OLS) based estimators \((\vartheta^*_n=(\tau^*_n, \beta^*_n), n \geq1)\) is defined by \[\tau^*_n = \left(\frac{\sum\limits_{i=1}^n (x_i -\overline{x}_n)Z_i}{\sum\limits_{i=1}^n (x_i -\overline{x}_n)^2} \right)^{-1} \quad and \quad \beta^*_n= \exp \left( \overline{Z}_n-(\tau^*_n)^{-1} \overline{x}_n \right),\] where \(\overline{Z}_n\) stands for the average of \(Z_1,\dots,Z_n\).
The sequence of Le Cam’s one-step estimators overperforms the initial sequence of OLS-based estimators in terms of asymptotic variance (see Figure 5 and the next Table). Since OLS presents a small bias for samples of finite size, its CvM statistics are bigger in the next Table.
It also overperforms the sequence of MLE in terms of computation time for large datasets, as shown in Table 6.
MLE | OLS | LCE | |
---|---|---|---|
Computation time (s) | 568.83 | 32.64 | 82.64 |
CvM statistic tau | 0.13 | 17.26 | 0.06 |
CvM statistic beta | 0.59 | 2.33 | 0.59 |
As soon as the ME is not in closed form or no other closed form estimators can be elicited, the use of a numerical ME as an initial sequence of guess estimator in Le Cam’s one-step procedure is difficult to justify. Indeed, no gain will be given in terms of time computation and the use of the MLE is finally equivalent. We propose therefore to use in the onestep function an improvement of the classical Le Cam procedure.
Indeed, it can be shown (see (Kamatani and Uchida 2015) or (Kutoyants and Motrunich 2016)) that for a \(n^{\delta/2}\)–consistent initial sequence of guess estimators (with \(\frac12 < \delta \leq 1\)) and a Lipshitz Fisher information matrix, the sequence of Le Cam’s one-step estimators is also consistent, asymptotically normal and efficient (in the Fisher sense). In this setting, for a initial sequence which is neither asymptotically rate nor variance efficient, the new sequence is asymptotically rate and variance efficient.
This result allows one to use the numerical computation of the MLE on a subsample (of size \(n^\delta\), for \(\frac12 < \delta \leq 1\)) as an initial sequence of guess estimators. Namely, for this initial sequence of guess estimators \((\vartheta^*_n, n\geq 1)\), \[n^{\frac{\delta}{2}} \left( \vartheta^*_n-\vartheta \right) \rightarrow {\cal N}\left(0,{\cal I}(\vartheta)^{-1}\right).\]
Then the sequence of Le Cam’s one-step estimators given by (2) is also consistent, asymptotically normal and efficient (in the Fisher sense) with \[\sqrt{n}\left( \widehat{\vartheta}_n-\vartheta \right) \rightarrow {\cal N}\left(0,{\cal I}(\vartheta)^{-1}\right).\]
The choice of the exponent \(\delta\) that measures the size of the
subsample \(\frac12 <\delta \leq 1\) is set to 0.9 by default and can be
chosen by the user with the parameter control
, for instance in the
call
\[\texttt{onestep(data, distr, control=list(delta=0.7))}\]
The example of Pareto II distribution is interesting and shows the gain in terms of variance (with respect to the initial sequence of guess estimators) and in terms of computation time in comparison with the MLE.
This improved method is also used when the distribution does not belong
to the closed-formula family. The initial sequence of maximum likelihood
estimators is computed on a subsample with the mledist
function of the
fitdistrplus package. The score and the Hessian in the Newton step of
the Le Cam procedure (2) are numerically computed with the
functions grad
and hessian
of the numDeriv package (see
(Gilbert and Varadhan 2019)).
The sixth example is the joint estimation of the shape parameter \(\alpha>0\) and the scale parameter \(\sigma>0\) in the statistical experiment generated by a sample \((X_1,X_2,\ldots,X_n)\) of i.i.d. Pareto II (Lomax) random variables whose probability density function is given by \[f(x)= \frac{\alpha \sigma^\alpha}{(\sigma + x)^{\alpha+1}}, \quad x \in \mathbb{R}^+.\]
In this example neither the ME (when it exists for \(\alpha>2\)) nor the MLE are in closed-form. But the score and the Fisher information matrix can be explicitly computed.
Let \(\vartheta=(\alpha,\sigma)\). Then, considering the MLE computed on a subsample of size \(n^\delta\), \(\frac12 < \delta \leq 1\), as an initial sequence of guess estimators \((\vartheta_n^*, n \geq 1)\), we get \[n^{\frac{\delta}{2}} \left( \vartheta^*_n-\vartheta \right) \rightarrow {\cal N}\left(0,{\cal I}(\vartheta)^{-1}\right)\] with \[{\cal I}(\vartheta) = \begin{pmatrix} \frac{\alpha}{\sigma^2(\alpha+1)} & -\frac{1}{\sigma(\alpha+1)} \\ -\frac{1}{\sigma(\alpha+1)} & \frac{1}{\alpha^2}\end{pmatrix}.\] The computation of the Fisher information matrix can be found in (Brazauskas 2003).
Here again, the sequence of Le Cam’s one-step estimators naturally overperforms the initial sequence of MLE computed on a subsample in terms of asymptotic variance (see Figure 6 and the next Table for the CvM statistics).
It is also computed faster than the sequence of MLE, as shown in Table 7.
MLE | MLEsub | LCE | |
---|---|---|---|
Computation time (s) | 1013.00 | 393.42 | 414.08 |
CvM statistic alpha | 0.25 | 4.72 | 1.44 |
CvM statistic sigma | 0.43 | 5.54 | 1.42 |
For the generic example, we study the Weibull distribution again and
force it to be numerically computed with the function parameter
method="numeric"
in order not to use the closed form processing.
We recall that, in the generic procedure, the score and the Hessian in
the Newton step of the Le Cam procedure (2) are numerically
computed with the functions grad
and hessian
of the numDeriv
package. By default, the initial sequence of maximum likelihood
estimators is computed on a subsample if size \(n^\delta\) with
\(\delta=0.9\).
The numerically computed sequence of Le Cam’s one-step estimators reaches asymptotic efficiency for a simulation of \(M=10000\) Monte-Carlo replications of samples of size \(n=10000\) as shown by the CvM statistics summarized in the next table. It is still computed faster than the sequence of MLE, see Table 8.
MLE | LCE | GLCE | |
---|---|---|---|
Computation time (s) | 568.83 | 82.64 | 405.10 |
CvM statistic tau | 0.13 | 0.06 | 0.25 |
CvM statistic beta | 0.59 | 0.59 | 1.08 |
We would like to thank Yury Kutoyants and Marius Soltane for fruitful discussions on the topic. We also thank the referees for their valuable comments that improve the paper.
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Brouste, et al., "OneStep : Le Cam's One-step Estimation Procedure", The R Journal, 2021
BibTeX citation
@article{RJ-2021-044, author = {Brouste, Alexandre and Dutang, Christophe and Mieniedou, Darel Noutsa}, title = {OneStep : Le Cam's One-step Estimation Procedure}, journal = {The R Journal}, year = {2021}, note = {https://rjournal.github.io/}, volume = {13}, issue = {1}, issn = {2073-4859}, pages = {383-394} }