Validating and Extracting Information from National Identification Numbers in R: The Case of Finland and Sweden

National identification numbers (NIN) and similar identification code systems are widely used for uniquely identifying individuals and organizations in Finland, Sweden, and many other countries. To increase the general understanding of such techniques of identification, openly available methods and tools for NIN analysis and validation are needed. The hetu and sweidnumbr R packages provide functions for extracting embedded information, checking the validity, and generating random but valid numbers in the context of Finnish and Swedish NINs and other identification codes. In this article, we demonstrate these functions from both packages and provide theoretical context and motivation on the importance of the subject matter. Our work contributes to the growing toolkit of standardized methods for computational social science research, epidemiology, demographic studies, and other register-based inquiries.

Pyry Kantanen (Department of Computing, University of Turku) , Erik Bülow (Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy at University of Gothenburg) , Aleksi Lahtinen (Department of Computing, University of Turku) , Måns Magnusson (Department of Statistics Uppsala University Sweden) , Jussi Paananen (Institute of Biomedicine University of Eastern Finland) , Leo Lahti (Department of Computing, University of Turku)
2025-05-20

0.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2024-023.zip

M. Alastalo and I. Helén. A code for care and control: The PIN as an operator of interoperability in the nordic welfare state. History of the Human Sciences, 35(1): 242–265, 2022. URL https://doi.org/10.1177/09526951211017731.
A. Alterman. ”A piece of yourself”: Ethical issues in biometric identification. Ethics and information technology, 5(3): 139–150, 2003.
J. Anhøj. cprr: Functions for Working with Danish CPR Numbers. 2019. URL https://CRAN.R-project.org/package=cprr. R package version 0.2.0.
J. Brensinger and G. Eyal. The Sociology of Personal Identification. Sociological Theory, 2021. URL https://doi.org/10.1177/07352751211055771. OnlineFirst.
CPR-kontoret. Personnummeret i CPR-systemet. 2008. URL https://cpr.dk/media/12066/personnummeret-i-cpr.pdf. Accessed: 22.4.2022.
Digital and Population Data Services Agency. Reform of the separators in the personal identity code. 2022a. URL https://dvv.fi/en/reform-of-personal-identity-code. Accessed: 2025-01-08.
Digital and Population Data Services Agency. The personal identity code. 2022b. URL https://dvv.fi/en/personal-identity-code. Accessed: 2022-01-17.
M. Dodge and R. Kitchin. Codes of life: Identification codes and the machine-readable world. Environment and Planning D: Society and Space, 23: 851–881, 2005.
M. Foucault. Security, territory, population: Lectures at the collège de france, 1977-1978. New York: Palgrave Macmillan, 2009. Editors: Michel Senellart, François Ewald, Alessandro Fontana, Arnold I. Davidson.
W. Freitas. numbersBR: Validate, Compare and Format Identification Numbers from Brazil. 2018. URL https://CRAN.R-project.org/package=numbersBR. R package version 0.0.2.
J. Furseth and O. Ljones. 50-årsjubilant med behov for oppgradering. Samfunnsspeilet, 2015(1): 2015. URL https://www.ssb.no/befolkning/artikler-og-publikasjoner/50-arsjubilant-med-behov-for-oppgradering.
M. Gissler and J. Haukka. Finnish health and social welfare registers in epidemiological research. Norsk Epidemiologi, 14(1): 113–120, 2004.
G. Grolemund and H. Wickham. Dates and times made easy with lubridate. Journal of Statistical Software, 40(3): 1–25, 2011. URL https://www.jstatsoft.org/v40/i03/.
P. Hendricks. Generator: Generate data containing fake personally identifiable information. 2015. URL https://CRAN.R-project.org/package=generator. R package version 0.1.0.
T. Jerlach. Udviklingen på CPR-området i de seneste 20-25 år frem til 2009. 2009. URL https://cpr.dk/media/12060/udviklingen-paa-cpr-omraadet-frem-til-2009.pdf.
Å. Johansson. Från bläckpenna till datorhjärna. Deklarationen 100 år och andra tillbakablickar, 2003.
K. J. Krogness. Numbered individuals, digital traditions, and individual rights: civil status registration in Denmark 1645 to 2010. Ritsumeikan Law Review, 28: 87–126, 2011.
E. Mäkelä, K. Lagus, L. Lahti, T. Säily, M. Tolonen, M. Hämäläinen, S. Kaislaniemi and T. Nevalainen. Wrangling with non-standard data. 2612: 81–96, 2020.
Official Statistics of Finland (OSF). Preliminary population statistics [online publication]. 2022. URL https://www.stat.fi/en/publication/cktih2lwgb3db0b531gwi04h8. Accessed: 22.4.2022.
B. Otjacques, P. Hitzelberger and F. Feltz. Interoperability of E-Government Information Systems: Issues of Identification and Data Sharing. Journal of Management Information Systems, 23(4): 29–51, 2007. URL https://doi.org/10.2753/MIS0742-1222230403.
T. Salste. Henkilötunnus – ihmisten koodaaja. 2021. URL https://www.tuomas.salste.net/doc/tunnus/henkilotunnus.html. Accessed: 2021-12-13.
Statistics Sweden. Personal identity number. 2016.
Statistiska centralbyrån. SCB statistikdatabasen. [Elektronisk resurs] : Statistical database. 2022. URL https://www.scb.se/hitta-statistik/statistik-efter-amne/befolkning/befolkningens-sammansattning/befolkningsstatistik/pong/tabell-och-diagram/manadsstatistik--riket/befolkningsstatistik-2022/. Accessed: 22.4.2022.
R. Sund. Quality of the Finnish Hospital Discharge Register: A systematic review. Scandinavian journal of Public Health, 40: 505–15, 2012. DOI 10.1177/1403494812456637.
The Swedish Tax Agency. Personnummer: SKV 704 ed. 8. 2007.
Valtiovarainministeriö. Legislative proposals on digital identity and redesigning the system of personal identity codes will not be considered during this parliamentary session. 2023. URL https://valtioneuvosto.fi/-/10623/lakiesityksia-digitaalisesta-henkilollisyydesta-ja-henkilotunnuksen-uudistamisesta-ei-ehdita-kasitella-talla-istuntokaudella?languageId=en_US. Acessed: 2025-01-08.
Valtiovarainministeriö. Redesign of the personal identity code system lays the foundation for development of digital services. 2022. URL https://vm.fi/en/-/redesign-of-the-personal-identity-code-system-lays-the-foundation-for-development-of-digital-services. Accessed: 2025-01-08.
I. Watson. A short history of national identification numbering in Iceland. Bifröst Journal of Social Science / Tímarit um félagsvísindi, 1: 51–89, 2010.
H. Wickham and J. Bryan. R packages (2e). 2024. URL https://r-pkgs.org/introduction.html. Accessed: 202X-DD-MM.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Kantanen, et al., "Validating and Extracting Information from National Identification Numbers in R: The Case of Finland and Sweden", The R Journal, 2025

BibTeX citation

@article{RJ-2024-023,
  author = {Kantanen, Pyry and Bülow, Erik and Lahtinen, Aleksi and Magnusson, Måns and Paananen, Jussi and Lahti, Leo},
  title = {Validating and Extracting Information from National Identification Numbers in R: The Case of Finland and Sweden},
  journal = {The R Journal},
  year = {2025},
  note = {https://doi.org/10.32614/RJ-2024-023},
  doi = {10.32614/RJ-2024-023},
  volume = {16},
  issue = {3},
  issn = {2073-4859},
  pages = {4-14}
}