What’s New?

We discuss how news in R and add-on packages can be disseminated. R 2.10.0 has added facilities for computing on news in a common plain text format. In R 2.12.0, the Rd format has been further enhanced so that news can be very conveniently recorded in Rd, allowing for both improved rendering and the development of new news services.

Kurt Hornik (Department of Finance, Accounting and Statistics, Wirtschaftsuniversität Wien) , Duncan Murdoch (Department of Statistical and Actuarial Sciences, University of Western Ontario)
2010-12-01

The R development community has repeatedly discussed how to best represent and disseminate the news in R and R add-on packages. The GNU Coding Standards (http://www.gnu.org/prep/standards/html_node/NEWS-File.html) say

In addition to its manual, the package should have a file named NEWS which contains a list of user-visible changes worth mentioning. In each new release, add items to the front of the file and identify the version they pertain to. Don’t discard old items; leave them in the file after the newer items. This way, a user upgrading from any previous version can see what is new.

If the NEWS file gets very long, move some of the older items into a file named ONEWS and put a note at the end referring the user to that file.

For a very long time R itself used a three-layer (series, version, category) plain text NEWS file recording individual news items in itemize-type lists, using ‘o’ as item tag and aligning items to a common column (as quite commonly done in open source projects).

R 2.4.0 added a function readNEWS() (eventually moved to package tools) for reading the R NEWS file into a news hierarchy (a series list of version lists of category lists of news items). This makes it possible to compute on the news (e.g. changes are displayed daily in the RSS feeds available from http://developer.r-project.org/RSSfeeds.html), and to transform the news structure into sectioning markup (e.g. when creating an HTML version for reading via the on-line help system). However, the items were still plain text: clearly, it would be useful to be able to hyperlink to resources provided by the on-line help system (Murdoch and Urbanek 2009) or other URLs, e.g., those providing information on bug reports. In addition, it has always been quite a nuisance to add LaTeX markup to the news items for the release notes in the R Journal (and its predecessor, R News). The R Core team had repeatedly discussed the possibilities of moving to a richer markup system (and generating plain text, HTML and LaTeX from this), but had been unable to identify a solution that would be appropriate in terms of convenience as well as the effort required.

R add-on packages provide news-type information in a variety of places and plain text formats, with a considerable number of NEWS files in the package top-level source directory or its inst subdirectory, and in the spirit of a one- or two-layer variant of the R news format. Whereas all these files can conveniently be read by “users”, the lack of standardization implies that reliable computation on the news items is barely possible, even though this could be very useful. For example, upgrading packages could optionally display the news between the available and installed versions of packages (similar to the apt-listchanges facility in Debian-based Linux systems). One could also develop repository-level services extracting and disseminating the news in all packages in the repository (or suitable groups of these, such as task views (Zeileis 2005)) on a regular basis. In addition, we note that providing the news in plain text limits the usefulness in package web pages and for possible inclusion in package reference manuals (in PDF format).

R 2.10.0 added a function news() (in package utils) to possibly extract the news for R or add-on packages into a simple (plain text) data base, and use this for simple queries. The data base is a character frame with variables Version, Category, Date and Text, where the last contains the entry texts read, and the other variables may be NA if they were missing or could not be determined. These variables can be used in the queries. E.g., we build a data base of all currently available R news entries:

> db <- news()

This has 2950 news entries, distributed according to version as follows:

> with(db, table(Version))
Version
         2.0.1  2.0.1 patched          2.1.0 
            56             27            223 
         2.1.1  2.1.1 patched         2.10.0 
            56             31            153 
        2.10.1 2.10.1 patched         2.11.0 
            43             35            132 
        2.11.1          2.2.0          2.2.1 
            28            188             63 
 2.2.1 patched          2.3.0          2.3.1 
            18            250             38 
 2.3.1 patched          2.4.0          2.4.1 
            28            225             60 
 2.4.1 patched          2.5.0          2.5.1 
            15            208             50 
 2.5.1 patched          2.6.0          2.6.1 
            16            177             31 
         2.6.2  2.6.2 patched          2.7.0 
            53             14            186 
         2.7.1          2.7.2  2.7.2 patched 
            66             45              6 
         2.8.0          2.8.1  2.8.1 patched 
           132             59             27 
         2.9.0          2.9.1          2.9.2 
           124             56             26 
 2.9.2 patched 
             5 

(This shows “versions” such as ‘2.10.1 patched’, which correspond to news items in the time interval between the last patchlevel release and the next minor or major releases of R (in the above, 2.10.1 and 2.11.0), respectively. These are not valid version numbers, and are treated as the corresponding patchlevel version in queries.) To extract all bug fixes since R 2.11.0 that include a PR number from a formal bug report, we can query the news db as follows:

> items <- news(Version >= "2.11.0" & 
+               grepl("^BUG", Category) & 
+               grepl("PR#", Text),
+               db = db)

This finds 22 such items, the first one being

> writeLines(strwrap(items[1, "Text"]))
The C function mkCharLenCE now no longer
reads past 'len' bytes (unlikely to be a
problem except in user code). (PR#14246)

Trying to extract the news for add-on packages involved a trial-and-error process to develop a reasonably reliable default reader for the NEWS files in a large subset of available packages, and subsequently documenting the common format (requirements). This reader looks for version headers and splits the news file into respective chunks. It then tries to read the news items from the chunks, after possibly splitting according to category. For each chunk found, it records whether (it thinks) it successfully processed the chunk. To assess how well this reader actually works, we apply it to all NEWS files in the CRAN packages. As of 2010-09-11, there are 427 such files for a “news coverage” of about 20%. For 67 files, no version chunks are found. For 40 files, all chunks are marked as bad (indicating that we found some version headers, but the files really use a different format). For 60 files, some chunks are bad, with the following summary of the frequencies of bad chunks:

    Min.  1st Qu.   Median     Mean  3rd Qu. 
0.005747 0.057280 0.126800 0.236500 0.263800 
    Max. 
0.944400 

Finally, for 260 files, no chunks are marked as bad, suggesting a success rate of about 61 percent. Given the lack of formal standardization, this is actually “not bad for a start”, but certainly not good enough. Clearly, package maintainers could check readability for themselves and fix incompatibilities with the default format, or switch to it (or contribute readers for their own formats). But ideally, maintainers could employ a richer format allowing for more reliable computations and better processing.

In R 2.12.0, the new Rd format was enhanced in several important ways. The \subsection and \newcommand macros were added, to allow for subsections within sections, and user-defined macros. One can now convert R objects to fragments of Rd code, and process these. And finally, the rendering (of text in particular) is more customizable. Given the availability of \subsection, one can now conveniently map one- and two-layer hierarchies of item lists into sections (and subsections) of Rd \itemize lists. Altogether, the new Rd format obviously becomes a very attractive (some might claim, the canonical) format for maintaining R and package news information, in particular given that most R developers will be familiar with the Rd format and that Rd can be rendered as text, HTML (for on-line help) and LaTeX (for PDF manuals).

R 2.12.0 itself has switched to the Rd format for recording its news. To see the effect, run help.start(), click on the NEWS link and note that e.g. bug report numbers are now hyperlinked to the respective pages in the bug tracking system. The conversion was aided by the (internal) utility function news2Rd() in package tools which converted the legacy plain text NEWS to Rd, but of course required additional effort enriching the markup.

R 2.12.0 is also aware of inst/NEWS.Rd files in add-on packages, which are used by news() in preference to the plain text NEWS and inst/NEWS files. For NEWS files in a format which can successfully be handled by the default reader, package maintainers can use tools:::news2Rd(dir, "NEWS.Rd"), possibly with additional argument codify = TRUE, with dir a character string specifying the path to a package’s root directory. Upon success, the NEWS.Rd file can be further improved and then be moved to the inst subdirectory of the package source directory. For now, the immediate benefits of switching to the new format is that news() will work more reliably, and that the package news web pages on CRAN will look better.

As packages switch to Rd format for their news, additional services taking advantage of the new format can be developed and deployed. For the near future, we are planning to integrate package news into the on-line help system, and to optionally integrate the news when generating the package reference manuals. A longer term goal is to enhance the package management system to include news computations in apt-listchanges style. We hope that repository “news providers” will take advantage of the available infrastructure to enhance their news services (e.g., using news diffs instead of, or in addition to, the currently employed recursive diff summaries). But perhaps most importantly, we hope that the new format will make it more attractive for package maintainers to actually provide valuable news information.



Note

This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.

D. Murdoch and S. Urbanek. The new R help system. The R Journal, 1(2): 60–65, 2009.
A. Zeileis. CRAN task views. R News, 5(1): 39–40, 2005. URL http.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Hornik & Murdoch, "What's New?", The R Journal, 2010

BibTeX citation

@article{RJ-2010-2-whats-new,
  author = {Hornik, Kurt and Murdoch, Duncan},
  title = {What's New?},
  journal = {The R Journal},
  year = {2010},
  note = {https://rjournal.github.io/},
  volume = {2},
  issue = {2},
  issn = {2073-4859},
  pages = {74-76}
}