Variety and Mainstays of the R Developer Community

The thriving developer community has a significant impact on the widespread use of R software. To better understand this community, we conducted a study analyzing all R packages available on CRAN. We identified the most popular topics of R packages by text mining the package descriptions. Additionally, using network centrality measures, we discovered the important packages in the package dependency network and influential developers in the global R community. Our analysis showed that among the 20 topics identified in the topic model, Data Import, Export, and Wrangling, as well as Data Visualization, Result Presentation, and Interactive Web Applications, were particularly popular among influential packages and developers. These findings provide valuable insights into the R community.

Lijin Zhang (Graduate School of Education, Stanford University) , Xueyang Li (Department of Computer Science and Engineering, University of Notre Dame) , Zhiyong Zhang (Department of Psychology, University of Notre Dame)
2023-12-18

0.1 Supplementary materials

Supplementary materials are available in addition to this article. It can be downloaded at RJ-2023-060.zip

B. Aronson and K.-C. Yang. Birankr: Ranking nodes in bipartite and weighted networks. 2020. URL https://CRAN.R-project.org/package=birankr. R package version 1.0.1.
B. Aronson, K.-C. Yang, M. Odabas, Y.-Y. Ahn and B. L. Perry. Comparing measures of centrality in bipartite social networks: A study of drug seeking for opioid analgesics. SocArXiv, 2020. DOI 10.31235/osf.io/hazvs.
Y. Bao and A. Datta. Simultaneously discovering and quantifying risk types from textual risk disclosures. Management Science, 60(6): 1371–1391, 2014.
K. Benoit and A. Matsuo. Spacyr: Wrapper to the ’spaCy’ ’NLP’ library. 2020. URL https://CRAN.R-project.org/package=spacyr. R package version 1.2.1.
D. M. Blei, A. Y. Ng and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3: 993–1022, 2003.
P. Bonacich. Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology, 2(1): 113–120, 1972.
P. Bonacich. Some unique properties of eigenvector centrality. Social Networks, 29(4): 555–564, 2007.
J. M. Chambers. S, R, and Data Science. The R Journal, 12(1): 462–476, 2020. URL https://doi.org/10.32614/RJ-2020-028.
A. P. Christensen and Y. N. Kenett. Semantic network analysis (SemNA): A tutorial on preprocessing, estimating, and analyzing semantic networks. PsyArXiv, 2019. DOI 10.31234/osf.io/eht87.
G. Csárdi. Cranlogs: Download logs from the ’RStudio’ ’CRAN’ mirror. 2019. URL https://CRAN.R-project.org/package=cranlogs. R package version 2.1.1.
G. Csárdi, T. Nepusz, et al. The igraph software package for complex network research. InterJournal, complex systems, 1695(5): 1–9, 2006.
J. Fox. Aspects of the Social Organization and Trajectory of the R Project. The R Journal, 1(2): 5–8, 2009.
J. Fox and A. Leanage. R and The Journal of Statistical Software. Journal of Statistical Software, 73(1): 1–13, 2016.
L. C. Freeman. Centrality in social networks conceptual clarification. Social networks, 1(3): 215–239, 1978.
R. C. Gentleman, V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, et al. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 5(10): 1–16, 2004.
D. M. German, B. Adams and A. E. Hassan. The evolution of the R software ecosystem. In 2013 17th european conference on software maintenance and reengineering, pages. 243–252 2013. IEEE.
B. Grün and K. Hornik. topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13): 1–30, 2011. DOI 10.18637/jss.v040.i13.
X. He, M. Gao, M.-Y. Kan and D. Wang. Birank: Towards ranking on bipartite graphs. IEEE Transactions on Knowledge and Data Engineering, 29(1): 57–71, 2016.
R. Ihaka and R. Gentleman. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3): 299–314, 1996.
S. Jain and A. Sinha. Identification of influential users on twitter: A novel weighted correlated influence measure for covid-19. Chaos, Solitons & Fractals, 139: 110037, 2020.
I. Kosmidis. Cranly: Package directives and collaboration networks in CRAN. 2019. URL https://CRAN.R-project.org/package=cranly. R package version 0.5.4.
J. Lai, C. J. Lortie, R. A. Muenchen, J. Yang and K. Ma. Evaluating the popularity of R in ecology. Ecosphere, 10(1): e02567, 2019.
F. Morone and H. A. Makse. Influence maximization in complex networks through optimal percolation. Nature, 524(7563): 65–68, 2015.
L. Page, S. Brin, R. Motwani and T. Winograd. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab. 1999.
R Core Team. Writing r extensions. Vienna, Austria: R Foundation for Statistical Computing, 2021. URL https://cran.r-project.org/doc/manuals/r-release/R-exts.html.
A. Salavaty, M. Ramialison and P. D. Currie. Integrated value of influence: An integrative method for the identification of the most influential nodes within networks. Patterns, 1(5): 100052, 2020.
J. Silge, J. C. Nash and S. Graves. Navigating the R Package Universe. The R Journal, 10(2): 558–563, 2018. URL https://doi.org/10.32614/RJ-2018-058.
J. Silge and D. Robinson. Text mining with r: A tidy approach. O’Reilly Media, Inc., 2017.
S. Tippmann. Programming tools: Adventures with R. Nature News, 517(7532): 109, 2015.
Z. Wang, C. Du, J. Fan and Y. Xing. Ranking influential nodes in social networks based on node position and neighborhood. Neurocomputing, 260: 466–477, 2017.
H. Wickham. ggplot2. Wiley Interdisciplinary Reviews: Computational Statistics, 3(2): 180–185, 2011.
A. Zeileis. CRAN task views. R News, 5(1): 39–40, 2005.
Z. Zhang and D. Zhang. What is Data Science? An Operational Definition based on Text Mining of Data Science Curricula. Journal of Behavioral Data Science, 1(1): 1–16, 2021.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Zhang, et al., "Variety and Mainstays of the R Developer Community", The R Journal, 2023

BibTeX citation

@article{RJ-2023-060,
  author = {Zhang, Lijin and Li, Xueyang and Zhang, Zhiyong},
  title = {Variety and Mainstays of the R Developer Community},
  journal = {The R Journal},
  year = {2023},
  note = {https://doi.org/10.32614/RJ-2023-060},
  doi = {10.32614/RJ-2023-060},
  volume = {15},
  issue = {3},
  issn = {2073-4859},
  pages = {5-25}
}