openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlex

Bibliographic databases are indispensable sources of information on published literature. OpenAlex is an open-source collection of academic metadata that enable comprehensive bibliographic analyses (Priem et al. 2022). In this paper, we provide details on the implementation of openalexR, an R package to interface with the OpenAlex API. We present a general overview of its main functions and several detailed examples of its use. Following best API package practices, openalexR offers an intuitive interface for collecting information on different entities, including works, authors, institutions, sources, and concepts. openalexR exposes to the user different API parameters including filtering, searching, sorting, and grouping. This new open-source package is well-documented and available on CRAN.

Massimo Aria (Università degli Studi di Napoli Federico II) , Trang Le (Bristol Myers Squibb) , Corrado Cuccurullo (Università della Campania Luigi Vanvitelli) , Alessandra Belfiore (Università degli Studi di Napoli Federico II) , June Choe (University of Pennsylvania)
2024-04-11

0.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2023-089.zip

W. Ammar, D. Groeneveld, C. Bhagavatula, I. Beltagy, M. Crawford, D. Downey, J. Dunkelberger, A. Elgohary, S. Feldman, V. Ha, et al. Construction of the literature graph in semantic scholar. arXiv preprint arXiv:1805.02262, 2018.
M. Aria. openalexR: Getting bibliographic records from ’OpenAlex’ database using ’DSL’ API. 2022. https://github.com/massimoaria/openalexR, https://massimoaria.github.io/openalexR/.
M. Aria and C. Cuccurullo. Bibliometrix: An r-tool for comprehensive science mapping analysis. Journal of informetrics, 11(4): 959–975, 2017.
A. Belfiore, A. Salatino and F. Osborne. Characterising research areas in the field of AI. arXiv preprint arXiv:2205.13471, 2022.
D. S. Chawla. Unpaywall finds free versions of paywalled papers. Nature, 2017.
C. Chen. Science mapping: A systematic review of the literature. Journal of Data and Information Science, 2(2): 1–40, 2017. URL https://doi.org/10.1515/jdis-2017-0006.
G. Csárdi, J. Hester, H. Wickham, W. Chang, M. Morgan and D. Tenenbaum. Remotes: R package installation from remote repositories, including ’GitHub’. 2021. URL https://CRAN.R-project.org/package=remotes. R package version 2.4.2.
T. Dallas, A.-L. Gehman and M. J. Farrell. Variable bibliographic database access could limit reproducibility. BioScience, 68(8): 552–553, 2018.
C. Du, J. Cohoon, J. Priem, H. Piwowar, C. Meyer and J. Howison. CiteAs: Better software through sociotechnical change for better software citation. Companion Publication of the 2021 Conference on Computer Supported Cooperative Work and Social Computing, 2021.
G. Hendricks, D. Tkaczyk, J. Lin and P. Feeney. Crossref: The sustainable source of community-owned scholarly metadata. Quantitative Science Studies, 1(1): 414–427, 2020.
C. Herzog, D. Hook and S. Konkiel. Dimensions: Bringing down barriers between scientometricians and data. Quantitative Science Studies, 1(1): 387–395, 2020.
D. Hicks, P. Wouters, L. Waltman, S. De Rijcke and I. Rafols. Bibliometrics: The leiden manifesto for research metrics. Nature, 520(7548): 429–431, 2015.
D. W. Hook, S. J. Porter and C. Herzog. Dimensions: Building context for search and evaluation. Frontiers in Research Metrics and Analytics, 3: 23, 2018.
P. Kulkanjanapiban and T. Silwattananusarn. Comparative analysis of dimensions and scopus bibliographic data sources: An approach to university research productivity. International Journal of Electrical & Computer Engineering (2088-8708), 12(1): 2022.
A. Martı́n-Martı́n, M. Thelwall, E. Orduna-Malea and E. Delgado López-Cózar. Google scholar, microsoft academic, scopus, dimensions, web of science, and OpenCitations’ COCI: A multidisciplinary comparison of coverage via citations. Scientometrics, 126(1): 871–906, 2021.
S. McWeeny, J. Choe and E. S. Norton. SnowGlobe: An iterative search tool for systematic reviews and meta-analyses. 2021. DOI https://doi.org/10.17605/OSF.IO/U25RN.
S. McWeeny, S. Choi, J. Choe, A. LaTourrette, M. Y. Roberts and E. S. Norton. Rapid automatized naming (RAN) as a kindergarten predictor of future reading in english: A systematic review and meta-analysis. Reading Research Quarterly, 57(4): 1187–1211, 2022. URL https://ila.onlinelibrary.wiley.com/doi/abs/10.1002/rrq.467.
T. L. Pedersen. Ggraph: An implementation of grammar of graphics for graphs and networks. 2022a. URL https://CRAN.R-project.org/package=ggraph. R package version 2.1.0.
T. L. Pedersen. Tidygraph: A tidy API for graph manipulation. 2022b. URL https://CRAN.R-project.org/package=tidygraph. R package version 1.2.2.
J. Priem, H. Piwowar and R. Orr. OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833, 2022. URL http://altmetrics.org/manifesto.
J. Priem, D. Taraborelli, P. Groth and C. Neylon. Altmetrics: A manifesto. 2011.
A. P. Siddaway, A. M. Wood and L. V. Hedges. How to do a systematic review: A best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annual Review of Psychology, 70(1): 747–770, 2019. DOI 10.1146/annurev-psych-010418-102803.
V. K. Singh, P. Singh, M. Karmakar, J. Leta and P. Mayr. The journal coverage of web of science, scopus and dimensions: A comparative analysis. Scientometrics, 126(6): 5113–5142, 2021.
A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J. Hsu and K. Wang. An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th international conference on world wide web, pages. 243–246 2015.
N. J. Van Eck, L. Waltman, V. Larivière and C. Sugimoto. Crossref as a new source of citation data: A comparison with web of science and scopus. CWTS Blog, 17: 2018.
R. Van Noorden. Scientists join journal editors to fight impact-factor abuse. Nature News Blog, 16: 2013.
M. Visser, N. J. van Eck and L. Waltman. Large-scale comparison of bibliographic data sources: Scopus, web of science, dimensions, crossref, and microsoft academic. Quantitative Science Studies, 2(1): 20–41, 2021.
K. Wais. Gender prediction methods based on first names with genderizeR. R J., 8(1): 17, 2016.
L. Waltman and V. Larivière. Special issue on bibliographic data sources. Quantitative Science Studies, 1: 360–362, 2020.
K. Wang et al. A review of microsoft academic services for science of science studies. Front. Big data 2, 45 (2019). 2019.
K. Wang, Z. Shen, C. Huang, C.-H. Wu, Y. Dong and A. Kanakia. Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1): 396–413, 2020.
S. B. Wanyama, R. W. McQuaid and M. Kittler. Where you search determines what you find: The effects of bibliographic databases on systematic reviews. International Journal of Social Research Methodology, 25(3): 409–422, 2022.
H. Wickham. Httr: Tools for working with URLs and HTTP. 2022. URL https://CRAN.R-project.org/package=httr. R package version 1.4.4.
H. Wickham, J. Hester, W. Chang and J. Bryan. Devtools: Tools to make developing r packages easier. 2022. URL https://CRAN.R-project.org/package=devtools. R package version 2.4.5.
D. J. Winter. Rentrez: An r package for the NCBI eUtils API. PeerJ Preprints. 2017.
C. Wohlin. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th international conference on evaluation and assessment in software engineering, 2014. New York, NY, USA: Association for Computing Machinery. ISBN 9781450324762. URL https://doi.org/10.1145/2601248.2601268.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Aria, et al., "openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlex", The R Journal, 2024

BibTeX citation

@article{RJ-2023-089,
  author = {Aria, Massimo and Le, Trang and Cuccurullo, Corrado and Belfiore, Alessandra and Choe, June},
  title = {openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlex},
  journal = {The R Journal},
  year = {2024},
  note = {https://doi.org/10.32614/RJ-2023-089},
  doi = {10.32614/RJ-2023-089},
  volume = {15},
  issue = {4},
  issn = {2073-4859},
  pages = {167-180}
}