Kernel Heaping - Kernel Density Estimation from regional aggregates via measurement error model

The phenomenon of “aggregation” often occurs in the regional dissemination of information via choropleth maps. Choropleth maps represent areas or regions that have been subdivided and color-coded proportionally to ordinal or scaled quantitative data. By construction discontinuities at the boundaries of rigid aggregation areas, often of administrative origin, occur and inadequate choices of reference areas can lead to errors, misinterpretations and difficulties in the identification of local clusters. However, these representations do not reflect the reality. Therefore, a smooth representation of georeferenced data is a common goal. The use of naive non-parametric kernel density estimators based on aggregates positioned at the centroids of the areas result also in an inadequate representation of reality. Therefore, an iterative method based on the Simulated Expectation Maximization algorithm was implemented in the Kernelheaping package. The proposed approach is based on a partly Bayesian algorithm treating the true unknown geocoordinates as additional parameters and results in a corrected kernel density estimate.

Lorena Gril (Freie Universität Berlin, FB Wirtschaftswissenschaft) , Laura Steinkemper (Freie Universität Berlin, FB Wirtschaftswissenschaft) , Marcus Groß (INWT Statistics GmbH) , Ulrich Rendtel (Freie Universität Berlin, FB Wirtschaftswissenschaft)
2025-05-20

0.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2024-026.zip

G. Basulto-Elias, A. L. Carriquiry and K. et al. De Brabanter. Bivariate kernel deconvolution with panel data. Sankhya B, 83: 122–151, 2021.
A. W. Bowman. An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71(2): 353–360, 1984. URL http://www.jstor.org/stable/2336252 [online; last accessed July 5, 2024].
R. J. Carroll and L. A. Stefanski. Approximate quasi-likelihood estimation in models with surrogate predictors. Journal of the American Statistical Association, 85(411): 652–663, 1990. URL http://www.jstor.org/stable/2290000 [online; last accessed June 27, 2024].
Gilles. Celeux, Didier. Chauveau and Jean. Diebolt. Stochastic versions of the EM algorithm: An experimental study in the mixture case. Journal of Statistical Computation and Simulation, 55(4): 287–314, 1996. DOI 10.1080/00949659608811772.
A. Delaigle. Nonparametric kernel methods for curve estimation and measurement errors. Proceedings of the International Astronomical Union, 10: 28–39, 2014. URL https://api.semanticscholar.org/CorpusID:125933456.
T. Duong and M. Hazelton. Plug-in bandwidth matrices for bivariate kernel density estimation. Journal of Nonparametric Statistics, 15(1): 17–30, 2003. DOI 10.1080/10485250306039.
T. Duong and M. L. Hazelton. Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian Journal of Statistics, 32(3): 485–506, 2005. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9469.2005.00445.x.
K. Erfurth, M. Groß, U. Rendtel and T. Schmid. Kernel density smoothing of composite spatial data on administrative area level. AStA Wirtschafts- und Sozialstatistisches Archiv, 16(1): 25–49, 2021. DOI 10.1007/s11943-021-00298-9.
F. Farokhi. Deconvoluting kernel density estimation and regression for locally differentially private data. Scientific Reports, 10(1): 2020. DOI 10.1038/s41598-020-78323-0.
M. Groß, A.-K. Kreutzmann, U. Rendtel, T. Schmid and N. Tzavidis. Switching Between Different Non-Hierachical Administrative Areas via Simulated Geo-Coordinates: A Case Study for Student Residents in Berlin. Journal of Official Statistics, 36(2): 297–314, 2020. DOI 10.2478/jos-2020-0016.
M. Groß and U. Rendtel. Kernel Density Estimation for Heaped Data. Journal of Survey Statistics and Methodology, 4(3): 339–361, 2016. DOI 10.1093/jssam/smw011.
M. Groß, U. Rendtel, T. Schmid, S. Schmon and N. Tzavidis. Estimating the Density of Ethnic Minorities and Aged People in Berlin: Multivariate Kernel Density Estimation Applied to Sensitive Georeferenced Administrative Data Protected via Measurement Error. Journal of the Royal Statistical Society Series A: Statistics in Society, 180(1): 161–183, 2016. DOI 10.1111/rssa.12179.
S. Hadam, T. Schmid and J. Simm. Kleinräumige Prädiktion von Bevölkerungszahlen basierend auf Mobilfunkdaten aus Deutschland. In Schriftenreihe der ASI - arbeitsgemeinschaft sozialwissenschaftlicher institute, pages. 27–44 2020. Springer Fachmedien Wiesbaden. DOI 10.1007/978-3-658-31009-7_3.
W. Härdle. Applied nonparametric regression. Cambridge: Cambridge University Press, 1990.
NB. Heidenreich, A. Schindler and S. Sperlich. Bandwidth selection for kernel density estimation: A review of fully automatic selectors. AStA Advanced Statistical Analysis, 97: 403–433, 2014.
A. J. Izenman. Recent Developments in Nonparametric Density Estimation. Journal of the American Statistical Association, 86(413): 205–224, 1991.
M. C. Jones. Simple boundary correction for kernel density estimation. Statistics and Computing, 3(3): 135–146, 1993. DOI 10.1007/bf00147776.
U. Rendtel, A. Neudecker and L. Fuchs. Ein neues Web-basiertes Verfahren zur Darstellung der Corona-Inzidenzen in Raum und Zeit. AStA Wirtschafts- und Sozialstatistisches Archiv, 15(2): 93–106, 2021. DOI 10.1007/s11943-021-00288-x.
U. Rendtel and M. Ruhanen. Die Konstruktion von Dienstleistungskarten mit Open Data am Beispiel des lokalen Bedarfs an Kinderbetreuung in Berlin. AStA Wirtschafts- und Sozialstatistisches Archiv, 12(3-4): 271–284, 2018. DOI 10.1007/s11943-018-0235-y.
B. W. Silverman. Density Estimation for Statistics and Data Analysis. London: Chapman & Hall, 1986.
K. A. B. Stephan R. Sain and D. W. Scott. Cross-validation of multivariate densities. Journal of the American Statistical Association, 89(427): 807–817, 1994. DOI 10.1080/01621459.1994.10476814.
P. Walter, M. Groß, T. Schmid and K. Weimer. Iterative Kernel Density Estimation Applied to Grouped Data: Estimating Poverty and Inequality Indicators from the German Microcensus. Journal of Official Statistics, 38: 599–635, 2022. DOI 10.2478/jos-2022-0027.
M. P. Wand and C. Jones. Multivariate plug-in bandwidth selection. Computational Statistics, 9(2): 97–116, 1994.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Gril, et al., "Kernel Heaping - Kernel Density Estimation from regional aggregates via measurement error model", The R Journal, 2025

BibTeX citation

@article{RJ-2024-026,
  author = {Gril, Lorena and Steinkemper, Laura and Groß, Marcus and Rendtel, Ulrich},
  title = {Kernel Heaping - Kernel Density Estimation from regional aggregates via measurement error model},
  journal = {The R Journal},
  year = {2025},
  note = {https://doi.org/10.32614/RJ-2024-026},
  doi = {10.32614/RJ-2024-026},
  volume = {16},
  issue = {3},
  issn = {2073-4859},
  pages = {115-133}
}