LeArEst: Length and Area Estimation from Data Measured with Additive Error

This paper describes an R package LeArEst that can be used for estimating object dimensions from a noisy image. The package is based on a simple parametric model for data that are drawn from uniform distribution contaminated by an additive error. Our package is able to estimate the length of the object of interest on a given straight line that intersects it, as well as to estimate the object area when it is elliptically shaped. The input data may be a numerical vector or an image in JPEG format. In this paper, background statistical models and methods for the package are summarized, and the algorithms and key functions implemented are described. Also, examples that demonstrate its usage are provided.
Availability: LeArEst is available on CRAN.

Mirta Benšić (Department of Mathematics, University of Osijek) , Petar Taler (Department of Mathematics, University of Osijek) , Safet Hamedović (Faculty of Metallurgy and Materials Science) , Emmanuel Karlo Nyarko (Faculty of Electrical Engineering, Computer Science and Information Technology, University of Osijek) , Kristian Sabo (Department of Mathematics, University of Osijek)
2017-10-07

1 Introduction

Image noise may arise by the physical processes of imaging, or it can be caused by the presence of some unwanted structures (e.g., soft tissue captured in X-ray images of bones). Such problems can occur, for example, when the object is observed with a fluorescent microscope (Ruzin et al. 1999), ground penetrating radar, medical equipment (X-ray, ultrasound), etc. With the presence of additive noise, the detection of the object edge as well as determining length or area of the object becomes a non-trivial problem. The well known edge detection methods (Canny 1986; Qiu 2005) generally do not perform well.

Our approach does not use the mentioned edge detection methods, but looks at the problem in a different way. We start with a simple univariate model where the data represent independent realizations of a random variable \(X\), \(X = U + \varepsilon\). In the aforementioned equation \(U\) is supposed to be uniformly distributed over the object image and describes the object image without an error, while \(\varepsilon\) represents measurement error. It is shown that such a simple model can also be very useful in applications itself, not only related to the image analysis. For instance, in Tolić et al. (2017) the sum of uniform and normal distributions is confirmed to be the most representative distribution for modelling transmission loss data.

Different aspects of this model are developed in Benšić and Sabo (2007b), Benšić and Sabo (2007a), Benšić and Sabo (2010), Benšić and Sabo (2016), Sabo and Benšić (2009), and Schneeweiss (2004). The basic one-dimensional model is described in Section 2 together with the results that are used for statistical inference incorporated in the package. Although this model is not universal in all applications, we find it useful in some cases.

With the assumption that the observed object has a circular or elliptical shape, a two-dimensional approach has been developed, dealing with an object area estimation problem (Benšić and Sabo 2007a; Sabo and Benšić 2009). This approach utilizes many border estimations and performs parametric curve fitting on its results.

The package LeArEst (Bensic et al. 2017) uses these methods for length and area estimation of an object captured with noise. It supports numerical inputs, which is useful if a machine that records an object stores numerical data (coordinates of recorded points). However, if an object is captured in a picture file, the package includes a web interface with which one can load a picture, specify a line that intersects the object, adjust the parameters, and perform an edge detection on the drawn line. Another web interface allows the user to draw a rectangle around the object and perform area estimation of the marked object. Description of functions dealing with numerical and graphical estimations and examples of their use are given in Section 3.

2 Basic statistical model

The basic model we deal with in this package is an additive error model \[X=U+\varepsilon.\] Here we suppose that the random variable \(U\) is uniformly distributed on the interval \(\left[-a,a \right]\)\(a>0\), i.e., has a density \[\label{duniform} f_{U}(x;a)=\left\{\begin{array}{ll}\frac{1}{2a},& x\in [-a,a].\\ 0,&\mbox{else}\end{array}\right. \tag{1}\] and \(\varepsilon\) is an absolutely continuous random variable describing measurement error. Further, we assume that \(\varepsilon\) is independent of \(U\) with zero mean. Instead of the sample \(U_{1},U_{2},\ldots,U_{n}\) from \(U\), one can only observe the contaminated i.i.d. sample \(X_{1},X_{2},\ldots,X_{n}\) from \(X\). We are to estimate the unknown parameter \(a\).

Our model is the special case of a general additive error model \(X=Y+\varepsilon\), where \(Y\) and \(\varepsilon\) are assumed to be independent continuous random variables, but only \(X\) is observable. Estimating the unknown density \(f_{Y}\) from an i.i.d. sample \(X_{1},X_{2},\ldots,X_{n},X_{1}\sim X\), is known as the deconvolution problem. Usually, the error part \(\varepsilon\) is assumed to have a known density \(f_{\varepsilon}\). Several nonparametric methods have been developed to estimate \(f_{Y}\) (Meister 2009); the most popular and studied is the deconvolution kernel density estimator (Carroll and Hall 1988; Stefanski and Carroll 1990). The packages decon (Wang and Wang 2013) and deamer (Stirnemann et al. 2012) provide functions for estimating density \(f_{Y}\) in a nonparametric way. Two different approaches in estimating the support of a density from a contaminated sample can be seen in (Meister 2006) and (Delaigle and Gijbels 2006). One is based on the deconvolving kernel density estimator (Delaigle and Gijbels 2006) and the other on the moment estimation (Meister 2006). To our knowledge, none of them is implemented in some R package submitted to the CRAN repository.

For our purpose (e.g., estimating borders of some object from a noisy image), we find that the model with \(Y\sim\mathcal{U}[-a,a]\) is useful in some instances. Namely, in many cases we have a relatively high contrast between an object and its background as it is the case in Figure 1. It seems reasonable to assume that the data extracted from the green line in Figure 1 come from a uniform distribution, but contaminated by an additive error.

graphic without alt text
Figure 1: Line intersecting the object and histogram of recorded points for statistical inferences

Let \(f_{\varepsilon}\) and \(F_{\varepsilon}\) be the density and distribution function of the error part \(\varepsilon\). Then the density of \(X=U+\varepsilon\) is \[\label{fX1} f_{X}(x;a)=\int_{-\infty}^{\infty}f_{U}(t)f_{\varepsilon}(x-t)dt=\frac{1}{2a}\left(F_\varepsilon(x+a)-F_\varepsilon(x-a)\right). \tag{2}\] If we suppose that the distribution of \(\varepsilon\) belongs to a scale family, with scale parameter \(\sigma\), then ((2)) may be rewritten as

\[\label{fX2} f_{X}(x;a,\sigma)=\frac{1}{2a}\left(F\left( \frac{x+a}{\sigma}\right) -F\left( \frac{x-a}{\sigma}\right) \right), \tag{3}\] with \(F(x)\) being the standard (\(\sigma=1\) and zero mean) distribution function.

Let \(\mathbf{x}=\left(x_{1},\ldots,x_{n}\right)\) denote the realization of the i.i.d. sample \(\mathbf{X}=\left(X_{1},\ldots,X_{n}\right)\). The likelihood function has the form \[\label{lh} L\left(a,\sigma;\mathbf{x} \right)=\prod_{i=1}^{n}f_{X}(x_{i};a,\sigma)=\frac{1}{(2a)^{n}}\prod_{i=1}^{n}\left(F\left(\frac{x_{i}+a}{\sigma}\right)-F\left(\frac{x_{i}-a}{\sigma}\right)\right), \tag{4}\] and the log-likelihood function is given by \[\label{llh} l(a,\sigma)=-n\log(2a)+\sum_{i=1}^{n}\log \left(F\left(\frac{x_{i}+a}{\sigma}\right)-F\left(\frac{x_{i}-a}{\sigma}\right)\right). \tag{5}\] If the distribution of \(\varepsilon\) is symmetric around zero, then the Fisher information is \[I(a)=\frac{-1}{a^{2}}+\frac{1}{a\sigma^{2}}\int_{0}^{\infty}\frac{\left( f\left(\frac{x+a}{\sigma}\right)+f\left(\frac{x-a}{\sigma}\right)\right)^{2} }{F\left(\frac{x+a}{\sigma}\right)-F\left(\frac{x-a}{\sigma}\right)}\,dx.\] Supposing that parameter \(\sigma\) is known or consistently estimated then, under regularity, we have \[\label{ad} \sqrt{n}\left(\hat{a}_{ML}-a_{0} \right)\stackrel{D}\longrightarrow\mathcal{N}\left(0,\frac{1}{I(a_{0})}\right), \tag{6}\] where \(a_{0}\) is the true value of \(a\).

Some flexibility of our model is achieved by changing the error distribution. For now, three types of error distributions are available in the package. The normal distribution \(\varepsilon\sim\mathcal{N}\left(0,\sigma^{2}\right)\) is sometimes a natural choice. Properties of maximum likelihood (ML) and method of moments (MM) estimators with known \(\sigma^{2}\) are given in Benšić and Sabo (2007b). The model with \(Y\sim\mathcal{U}[0,a]\) is treated in Schneeweiss (2004). Benšić and Sabo (2010) considered the unknown \(\sigma^{2}\) case. The possibility of this (one-dimensional) model in two-dimensional problems is given in Benšić and Sabo (2007a) and Sabo and Benšić (2009). For the sake of robustness Laplace and scaled Student (with \(5\) degrees of freedom) distributions are also incorporated in the package as a choice of the error distribution. Estimating issues with Laplacian error can be seen in Benšić and Sabo (2016), as well as a discussion connected to robustness.

Two procedures for deriving confidence intervals for \(a\) are described in (Hamedović et al. 2017). The first one is based on the asymptotic distribution of ML estimator in ((6)). For a specified \(0<\alpha<1\), an asymptotic \((1-\alpha)100\%\) confidence interval for \(a\) is1

\[\left(\hat{a}_{ML}-\frac{z_{\alpha /2}}{\sqrt{nI(\hat{a}_{ML}) } },\hat{a}_{ML}+\frac{z_{\alpha /2}}{\sqrt{nI(\hat{a}_{ML}) }} \right).\] The second method is based on the likelihood ratio statistic \[\lambda(\mathbf{X})=\frac{\sup\limits_{H_{0}}L\left(a;\mathbf{X} \right)}{\displaystyle\sup_{(0,\infty)}L\left(a;\mathbf{X} \right)}=\frac{L\left(a_{0};\mathbf{X} \right)}{L\left(\hat{a}_{ML};\mathbf{X} \right)}.\] From the asymptotic distribution of the log-likelihood ratio statistic \[-2\log \lambda(\mathbf{X}) \stackrel{D}\longrightarrow \chi_{1}^{2}\] an approximate \((1-\alpha)100\%\) confidence interval for \(a\) is \[\left\lbrace a\vert\, l(\hat{a}_{ML})-l(a)\leq 0.5\chi^{2}_{1}(1-\alpha)\right\rbrace,\] where \(\chi^{2}_{1}(1-\alpha)\) is the \(1-\alpha\) quantile of \(\chi_{1}^{2}\) distribution.

These two approaches can be used to test hypotheses regarding the parameter \(a\). For example, in the case of a two-sided hypotheses \(H_{0}:a=a_{0}\) versus \(H_{1}:a\neq a_{0}\), the critical regions of asymptotic size \(\alpha\) are \[\left\lbrace \mathbf{x}\vert\, \sqrt{nI(a_{0})}|\hat{a}_{ML}-a_{0}|\geq z_{\alpha /2} \right\rbrace, \text{ and}\]

\[\left\lbrace \mathbf{x}\vert\, -2\log \lambda(\mathbf{x})\geq \chi_{1}^{2}(1-\alpha) \right\rbrace,\] respectively. Note that both methods are asymptotically equivalent.

3 Overview of the package

The package LeArEst depends on the following packages that should installed in addition to the LeArEst package: conicfit (Gama and Chernov 2015), jpeg (Urbanek 2014), and opencpu (Ooms 2014). The stable version of the package is available on the Comprehensive R Archive Network repository (CRAN; https://CRAN.R-project.org/) and can be downloaded and installed by issuing the following command at the R console:

> install.packages("LeArEst")

The package is loaded using the following command:

> library(LeArEst)

An overview of the package’s functions is given in Table 1.

Table 1: Overview of LeArEst functions
Function Description
lengthest() Performs length estimation from a numerical data set.
lengthtest() Performs one-sided and two-sided tests for uniform
distribution half-length.
areaest() Performs area estimation of a numerically described object
in plane.
startweb.esttest() Opens default web browser and loads a web page for length
estimation and testing (the object of interest is shown in
an image).
startweb.area() Opens default web browser and loads a web page for area
estimation of the object shown in an image.

Length estimation — a numerical data set

The function lengthest computes the length of an interval which is the domain of a uniform distribution from data contaminated by an additive error according to the model described in the previous section. The function’s arguments and results are given in Table 2.

Table 2: Summary of arguments and results of lengthest
Arguments Description
x Vector of input data.
error Error distribution.
var Error variance.
var.est Method of error variance estimation.
conf.level Confidence level of the confidence interval. Defaults to \(0.95\).
Results Description
radius Estimated half-length of the uniform support.
var.error Error variance, estimated or explicitly given by argument var.
conf.int Confidence interval for half-length of the uniform support.
method Method used for computing a confidence interval
(asymptotic distribution of ML or likelihood ratio statistic).

In order to perform length estimation, a type of the error distribution must be chosen through the argument error with three possibilities: laplace (Benšić and Sabo 2016), gauss (Benšić and Sabo 2007b, 2010), or student (scaled Student distribution with 5 degrees of freedom).

The variance of the additive error may or may not be known. If the variance is known, argument var should be used and the variance should be assigned to it. In the case of unknown variance, function lengthest implements two methods for its estimation: Method of Moments and Maximum Likelihood. Value MM of the argument var.est instructs the functions to use Method of Moments, while the corresponding value ML triggers Maximum Likelihood Method. There is the possibility, depending on the input data, that the Method of Moments estimate of error variance does not exist. When that is the case, the function stops and outputs the message instructing the user to use Maximum Likelihood estimator or to give an explicit variance. It is important to mention that arguments var and var.est may not be used simultaneously.

The last argument, conf.level, specifies the confidence level of the confidence interval calculated by the function.

The results of this function are the estimated half-length of uniform distribution (i.e., of an object), estimated or explicitly given error variance, confidence interval for half-length (with regard to the given confidence level) and the statistical method for computing a confidence interval.

Usage example. Let us generate a sample of size \(1000\) from \(X=U+\varepsilon\), where \(U\sim\mathcal{U}\left[-1, 1\right]\) and \(\varepsilon\sim\mathcal{N}\left(0,(0.1)^{2}\right)\):

set.seed(12)
sample_1 <- runif(1000, -1, 1)
sample_2 <- rnorm(1000, 0, 0.1)
sample <- sample_1 + sample_2
graphic without alt text
Figure 2: Estimation of the density function

Figure 2 shows density estimation from the generated data obtained with the R function density. A half-length estimation of the uniform support for these data can be done with the following command:

lengthest(x = sample, error = "gauss", var.est = "MM", conf.level = 0.90)

The most important part of its output is:

$radius
MLE for radius (a) of uniform distr.: 0.9916513
$var.error
MM estimate for error variance: 0.01279636
$method
[1] "Asymptotic distribution of LR statistic"
$conf.int
[1] 0.9724316 1.0116479

Testing hypothesis — a numerical data set

Function lengthtest performs one-sided and two-sided tests against hypothesized half-length of the uniform support as it is described in Section 2. Since the actual calculations inside this function are based on the ML approach most input arguments are similar to those in the function lengthest (see Table 3). Argument null.a is a positive number representing hypothesized half-length of the uniform support, while argument alternative defines the usual forms of alternatives (two.sided, greater, or less).

Table 3: Summary of arguments and results of lengthtest
Arguments Description
x Vector of input data.
error Error distribution.
null.a Specified null value being tested.
alternative The form of the alternative hypothesis.
var Error variance.
var.est Method of error variance estimation.
conf.level Confidence level of the confidence interval. Defaults to \(0.95\).
Results Description
p.value p-value of the test.
tstat The value of the test statistic.
radius Estimated half-length of the uniform support.
var.error Error variance, estimated or explicitly given by argument var.
conf.int Confidence interval for half-length.
method Method used for computing a confidence interval
(asymptotic distribution of ML or likelihood ratio statistic).

Function lengthtest also performs length estimation, so all values from its output, except p.value and the calculated value of the test statistic (tstat), are the same as that of the function lengthest.

Usage example. Generate the data in a similar manner as in the lengthest example:

set.seed(12)
sample_1 <- runif(1000, -1, 1)
sample_2 <- rnorm(1000, 0, 0.1)
sample <- sample_1 + sample_2

To test that the uniform support half-length equals \(1\) against that it is less than \(1\) the function lengthest can be used in the following way:

lengthtest(x = sample, error = "gauss", alternative = "less", var.est = "MM",
           null.a = 1, conf.level = 0.95)

Part of the output dealing with a testing procedure is:

$p.value
[1] 0.2418929
$tstat
[1] -0.7002265

Area estimation — a numerical data set

The input for the function areaest is supposed to be a data set of points in the plane representing independent realizations of a two-dimensional random vector \[X=U+\varepsilon.\] It is assumed that \(U\) has a uniform distribution on an ellipsoid and \(\varepsilon\) is a two-dimensional error term independent of \(U\). Arguments and results of this function are listed in Table 4.

Table 4: Summary of arguments and results of areaest
Arguments Description
data Two-column data matrix containing the points that describe the
observed object. The first column represents the \(x\) coordinate of the point,
while the second column represents its \(y\) coordinate.
nrSlices Number of slices applied for plain data cutting. Defaults to 10.
error Error distribution.
var Error variance.
var.est Method of error variance estimation.
plot Logical parameter that determines whether to plot data set, calculated
edge points, and the resulting ellipse. Defaults to FALSE.
Results Description
area Estimated area of the object.
points Set of estimated object’s edge points.
semiaxes Resulting ellipse’s semi-axes.

The algorithm implemented in the function areaest is explained in detail in (Benšić and Sabo 2007a). The main task in area estimation is to estimate edge points of the uniform support. In order to achieve this, the original problem is reduced to several corresponding one-dimensional problems, which can in turn be solved by function lengthest.

Let us denote the data set with \(D = \left\{\left(x_i, y_i \right), i = 1, \dots, n \right\}\). The function areast transforms this data set in two different ways: through the \(y\)-axis and through the \(x\)-axis. The algorithm for transformation through the \(y\)-axis is presented below, while the transformation through the \(x\)-axis is done analogously.

Algorithm 1

(Transformation through the \(y\)-axis (Benšić and Sabo 2007a))

Separating through the \(y\)-axis.

Choose an integer \(m<n\) and real numbers \(\eta_1 < \eta_2 < \dots < \eta_m\) such that

Centering through the \(y\)-axis.

Let us denote \[c_k := \frac{1}{\left|C_k\right| } \sum_{\left(x_i,y_i\right) \in C_k } x_i, \ d_k := \frac{1}{\left|C_k\right|} \sum_{\left(x_i,y_i\right) \in C_k } y_i,\]

\[k=1,\dots,m-1,\] for \(k=1,\dots,m-1\) define \(\overline C_k := \left\{ x_i-c_k : \left( x_i, y_i \right) \in C_k \right\}\).

Using this algorithm the data are transformed in the way that we have sets \(\overline C_k,\ k=1,\dots,m-1\) that represent centered tiny strips. Argument nrSlices corresponds to \(m-1\) and specifies the number of strips. The lengths of these strips (in \(x\)-direction) can be estimated using the function lengthest (the parameters error, var, and var.est are used in a lengthest call in the way described earlier). After doing so, the algorithm needs to be repeated through the \(x\)-axis. Finally, at this point of the procedure, the data that is a noisy version of points from the curve is created – it represents estimated points from the border of the object.

The next task is to choose one of the well-known curve fitting procedures for parameter estimation. Here we are dealing with a nonlinear parameter estimation problem.

Let us suppose that we have an elliptical domain, i.e.,

\[\mathcal{D}(\mathbf{p}) = \left\{ (x,y) \in \mathbb{R}^2 : \frac{(x-p)^2}{\alpha^2} + \frac{(y-q)^2}{\beta^2} \le 1 \right\},\]

\[\mathbf{p} = \left(p, q, \alpha, \beta\right)^T.\]

On the basis of data obtained so far, the vector of unknown parameters \(p\) needs to be estimated, and by doing so, the optimal ellipse that fits into our points is to be defined. For this purpose EllipseDirectFit function from the conicfit package is used. This function implemets the algebraic ellipse fit method by Fitzgibbon-Pilu-Fisher (Fitzgibbon et al. 1999). Having parameters \(p\), it is a trivial task to calculate the area of the ellipse that approximates the observed object.

Usage example. Two internal files are provided with the package: ellipse_3_4_0.1_gauss.txt and ellipse_3_4_0.1_laplace.txt. Both of them represent an ellipsoidal object with center in point \((1,1)\), half-axes \(1.5\) and \(2\), with added measurement error. In the first file, the error distribution is a two-dimensional normal with independent margins and variance 0.01, while in the other it is Laplacian (\(\lambda=0.1\)) in both directions, again with independent margins.

In order to use one of these files, the data needs to be read into a data frame:

inputfile <- system.file("extdata", "ellipse_3_4_0.1_laplace.txt", package = "LeArEst")
inputdata <- read.table(inputfile)

Area estimation of the uniform support can be done with the command:

areaest(inputdata, error = "laplace", var.est = "ML", nrSlices = 5, plot = TRUE)

In the previous example, the parameter plot is set to TRUE, so the function plots the given input data (black dots), estimated border points (red dots), and the resulting ellipse (cyan ellipse); see Figure 3.

graphic without alt text
Figure 3: Data points, estimated border points, and the resulting ellipse obtained by the function areaest

The most important parts of the numerical output are:

$area
[1] 9.938305
$semiaxes
[1] 2.048028 1.544638

Length estimation and testing for an object shown in a picture

In order to apply the described methods to a picture of an object, two web interfaces have been built and embedded into the package.

As far as we know, shiny (Chang et al. 2017) provides the simplest way of building web applications using R. However, limitations of its free version discouraged us from using it, so we decided to use the opencpu package. This package provides a reliable and interoperable HTTP API for data analysis based on R. Basically, it provides an interface between functions in R package and a custom-made web page bundled with the package, using JavaScript and AJAX. Building web interfaces using opencpu is more complex than using shiny, but at the same time, it provides more flexibility in application design. It is assumed that developers are familiar with HTTP protocol, HTML, and the JavaScript language, in order to develop such web applications.

Function startweb.esttest will be described in this section. This function takes no arguments and returns no results, its task is to start a web interface for length estimation and hypothesis testing (Figure 4).

graphic without alt text
Figure 4: Web interface for length estimation and hypothesis testing. Loaded image shows arterial wall of the carotid artery; we are trying to estimate its intima media thickness (darker layer below artery cavity).

To start the analysis using the web interface the picture in JPG format should be loaded (Load Picture button). Then, a line should be drawn that intersects the object of interest by clicking on two points in the picture - length estimation will be performed on that line. Finally, parameters for a data set preparation should be set.

The Levels of gray parameter determines how many levels of grey the algorithm should take into account. It is important to mention that, although color images can be loaded, they are internally converted to grey-scales prior to any calculations. Since JPG format supports \(2^{24}\) different colors, the number of possible colors should be reduced in order to optimize estimation speed and memory consumption.

Line thickness specifies how many picture pixels around the drawn line are taken into account in length estimation. For instance, if Line thickness is set to \(3\), the algorithm takes pixels which are direct left neighbours of the line, pixels on the line itself, and the ones which are direct right neighbours of the line. By doing so, the matrix of (length of the line) \(\times\) Line thickness pixels – PixelMatrix is obtained.

By doing so, we have obtained the matrix of (length of the line) \(\times\) Line thickness pixels – PixelMatrix.

By clicking on the Prepare data button the data set will be prepared for the inference.

The following step deals with data preparation and is a crucial step of the algorithm. Each pixel of PixelMatrix is mapped to a new matrix of Box size \(\times\) Box size booleans – DotMatrix (note that Box size is a parameter). Further, every DotMatrix is filled with uniformly distributed dots (i.e., TRUE values) in a way that total number of dots in each DotMatrix corresponds to the brightness of the pixel it represents. Then, DotMatrices are tiled up with respect to the position of corresponding pixel and, by doing so, a new matrix of (length of the line \(\cdot\) Box size) \(\times\) (Line thickness \(\cdot\) Box size) booleans is obtained – FinalDotMatrix. The last step is to summarize rows of the FinalDotMatrix to obtain a vector of (length of the line) \(\cdot\) Box size integers. The vector’s histogram is shown on a web interface (Figure 4, at the bottom) and the vector itself serves as an input to the functions lengthest or lengthtest.

Parameters in the Estimation section of the web interface are transferred to lengthest, as well. After the user clicks on Estimate button, lengthest is executed, and its results are displayed below the picture. Additionally, the estimated uniform support is marked red on the intersecting line.

Estimated length is expressed in width of a pixel and in percentage of whole image’s width as well. As stated in the info box, it is important to use a proportional screen resolution on user’s display, so the pixels on the screen are square-shaped.

The Testing section of this web interface serves for hypothesis testing. Procedures related to image loading, choosing an intersecting line, and data preparation are the same as described above. For the purpose of testing, values for H0, unit, and alternative (greater, less, or two-sided) need to be specified. The part of the web interface dealing with output of hypothesis testing procedure is shown in Figure 5.

graphic without alt text
Figure 5: Web interface for length estimation and hypothesis testing - hypothesis testing output

Area estimation of an object shown in a picture

The function startweb.area starts a web interface for area estimation (Figure 6). Again, the first step is to load an image. To select an object whose surface needs to be evaluated, a rectangle should be drawn around it. It is done by clicking on its upper-left and lower-right corners, after which a green rectangle is drawn on the picture.

Data parameters are similar to ones in the length estimation web interface, with the exception of number of slices.

The first step in the area estimation algorithm for this function is to roughly isolate the object in the selected rectangle. In order to do that, pixels from the selected rectangle are divided into two clusters by using the kmeans function from base-R stats package (the criterion for clustering is pixel brightness). Further, only pixels from the ‘object cluster’ are observed and divided into horizontal and vertical stripes, as described earlier in Algorithm 1. The number of stripes is dictated by the number of slices parameter. A length estimation procedure is conducted on each stripe, obtaining two estimated edge points of the object for each stripe (red dots in Figure 5). Two parameters in the Estimation section of the web interface are related to the length estimation procedure of the stripes.

Finally, an optimal ellipse that fits into edge points is found using EllipseDirectFit function from the conicfit package, as well as in the areaest function described earlier. The resulting ellipse is drawn in red in Figure 5. Its area is printed below, this is measured in pixels and the percentage of the whole image area.

graphic without alt text
Figure 6: Web interface for area estimation showing an MRI scan detail (taken from Bankman (2008))

4 Concluding remarks

The R package LeArEst provides routines for estimating the support of the random variable \(U,\,U\sim\mathcal{U}[-a,a],\) based on a sample from \(X=U+\varepsilon\). The random variable \(\varepsilon\) represents additive measurement error and is supposed to have either a normal, Laplace, or scaled Student distribution with 5 degrees of freedom. The package also includes functions for estimating either the borders or the area of some object from a noisy image. The package may be useful for this purpose mainly in the case of images with reasonable contrast between the object of interest and background. For greater robustness, we find it convenient to use some error distributions with heavier tails. Sometimes we have different amounts of noise in the tails, so it would be useful to include some asymmetric error distributions as well. These are some features we are going to add in the package in order to improve flexibility of our models.

5 Acknowledgement

This work was supported by the Croatian Science Foundation through research grant IP-2016-06-6545. We would like to thank Krunoslav Buljan from Osijek Clinical Hospital Center for providing an image of the carotid artery.

CRAN packages used

LeArEst, decon, deamer, conicfit, jpeg, opencpu, shiny

CRAN Task Views implied by cited packages

ModelDeployment, NumericalMathematics, WebTechnologies

Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

I. Bankman. Handbook of medical image processing and analysis. Elsevier Science, 2008.
M. Bensic, S. Hamedovic, K. Sabo and P. Taler. LeArEst: Border and area estimation of data measured with additive error. 2017. R package version 0.1.
M. Benšić and K. Sabo. Border estimation of a two-dimensional uniform distribution if data are measured with additive error. Statistics, 41(4): 311–319, 2007a.
M. Benšić and K. Sabo. Estimating a uniform distribution when data are measured with a normal additive error with unknown variance. Statistics, 44(3): 235–246, 2010.
M. Benšić and K. Sabo. Estimating the width of a uniform distribution when data are measured with additive normal errors with known variance. Computational statistics & data analysis, 51(9): 4731–4741, 2007b.
M. Benšić and K. Sabo. Uniform distribution width estimation from data observed with Laplace additive error. Journal of the Korean Statistical Society, 45(4): 505–517, 2016.
J. Canny. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, PAMI-8(6): 679–698, 1986.
R. J. Carroll and P. Hall. Optimal rates of convergence for deconvoluting a density. J. Amer. Statist. Assoc., 83: 1184–1186, 1988.
W. Chang, J. Cheng, J. Allaire, Y. Xie and J. McPherson. Shiny: Web application framework for r. 2017. URL https://CRAN.R-project.org/package=shiny. R package version 1.0.3.
A. Delaigle and I. Gijbels. Estimation of boundary and discontinuity points in deconvolution problems. Statistica Sinica, 16: 773–788, 2006.
A. Fitzgibbon, M. Pilu and R. B. Fisher. Direct least square fitting of ellipses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5): 476–480, 1999.
J. Gama and N. Chernov. Conicfit: Algorithms for fitting circles, ellipses and conics based on the work by prof. Nikolai chernov. 2015. URL https://CRAN.R-project.org/package=conicfit. R package version 1.0.4.
S. Hamedović, M. Benšić, K. Sabo and P. Taler. Estimating the size of an object captured with error. Department of Mathematics, University of Osijek. 2017. URL http://www.mathos.unios.hr/images/homepages/ksabo/hambensabtal.pdf. Submitted to Central European Journal of Operations Research.
A. Meister. Deconvolution problems in nonparametric statistics. Springer-Verlag, 2009.
A. Meister. Support estimation via moment estimation in presence of noise. Statistics, 40: 259–275, 2006.
J. Ooms. The OpenCPU system: Towards a universal interface for scientific computing through separation of concerns. arXiv:1406.4806 [stat.CO], 2014. URL http://arxiv.org/abs/1406.4806.
P. Qiu. Image processing and jump regression analysis. John Wiley & Sons, 2005.
S. E. Ruzin et al. Plant microtechnique and microscopy. Oxford University Press New York, 1999.
K. Sabo and M. Benšić. Border estimation of a disc observed with random errors solved in two steps. Journal of computational and applied mathematics, 229(1): 16–26, 2009.
H. Schneeweiss. Estimating the endpoint of a uniform distribution under measurement errors. Central European Journal of Operations Research, 12(2): 2004.
L. Stefanski and R. J. Carroll. Deconvoluting kernel density estimators. Statistics, 21: 169–184, 1990.
J. Stirnemann, A. Samson, F. Comte and C. Lacour. Deamer: Deconvolution density estimation with adaptive methods for a variable prone to measurement error. 2012. URL https://CRAN.R-project.org/package=deamer. R package version 1.0.
I. Tolić, K. Miličević, N. Šuvak and I. Biondić. Non-linear least squares and maximum likelihood estimation of probability density function of cross-border transmission losses. IEEE Transactions on Power Systems, 2017.
S. Urbanek. Jpeg: Read and write JPEG images. 2014. URL https://CRAN.R-project.org/package=jpeg. R package version 0.1-8.
X.-F. Wang and B. Wang. Decon: Deconvolution estimation in measurement error models. 2013. URL https://CRAN.R-project.org/package=decon. R package version 1.2-4.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Benšić, et al., "LeArEst: Length and Area Estimation from Data Measured with Additive Error", The R Journal, 2017

BibTeX citation

@article{RJ-2017-043,
  author = {Benšić, Mirta and Taler, Petar and Hamedović, Safet and Nyarko, Emmanuel Karlo and Sabo, Kristian},
  title = {LeArEst: Length and Area Estimation from Data Measured with Additive Error},
  journal = {The R Journal},
  year = {2017},
  note = {https://rjournal.github.io/},
  volume = {9},
  issue = {2},
  issn = {2073-4859},
  pages = {461-473}
}