Version 2.10.0 of R includes new code for processing .Rd
help files. There are some changes to what is allowed, and some new capabilities and opportunities.
One of the reasons for the success of R is that package authors are
required to write documentation in a structured format for all of the
functions and datasets that they make visible in their packages. This
means users can count on asking for help on a topic "foo"
using either
the function call help("foo")
, or one of the shortcuts: entering
?foo
in the console, or finding help through the menu system in one of
the graphical user interfaces. The quality of the help is improved by
the structured format: the quality assurance tools such as R CMD check
can report inconsistencies between the documentation and the code,
undocumented objects, and other errors, and it is possible to build
indices automatically and do other computations on the text.
The original help system was motivated by the help system for S (Becker, J. M. Chambers, and A. R. Wilks 1988), and the look of the input files was loosely based on LaTeX (Lamport 1986). Perl (Wall, T. Christiansen, and J. Orwant 2000) scripts transformed the input files into formatted text to display in the R console, LaTeX input files to be processed into Postscript or PDF documents, and HTML files to be viewed in a web browser.
These Perl scripts were hard to maintain, and inconsistencies crept into the system: different output formats could inadvertently contain different content. Moreover, since the scripts could only look for fairly simple patterns, the quality control software had trouble detecting many errors which would slip through and result in rendering errors in the help pages, unknown to the author.
At the useR! 2008 meeting in Dortmund, one of us (Murdoch) was convinced
to write a full-fledged parser for .Rd
files. This made it into the
2.9.0 release in April, 2009, but only as part of the quality control
system: and many package authors started receiving warning and error
messages. Over the summer since then several members of the R Core team
(including especially Brian Ripley and Kurt Hornik, as well as the
authors of this article) have refined that parser, and written renderers
to replace the Perl scripts, so that now all help processing is done in
R code. We have also added an HTTP server to R to construct and deliver
the help pages to a web browser on demand, rather than relying on static
copies of the pages.
This article describes the components of the new help system.
In the new help system, the transformation from an .Rd
file to a
.tex
, .html
or .txt
file is a two-step process. The parse_Rd()
function in the tools package converts the .Rd
file to an R object
with class "Rd"
representing its structure, and the renderers convert
that to the target form.
The syntax of .Rd
files handled by the parser is in some ways more
complicated than the syntax of R itself. It contains LaTeX-like markup,
R-like sections of code, and occasionally code from other languages as
well. Figure 1 shows a near-minimal example.
Like LaTeX, a comment is introduced by a percent sign (%
), and markup
is prefixed with a backslash (\
). Braces ({}
) are used to delimit
the arguments to markup macros. However, notice that in the
\examples{}
section, we follow R syntax: braces are used to mark the
body of the for
loop.
In fact, there are three different modes that the parser uses when
parsing the .Rd
file. It starts out in a LaTeX-like mode, where
backslashes and braces are used to indicate macros and their arguments.
Some macros (e.g. \usage{}
or \examples{}
) switch into R-like mode:
macros are still recognized, but braces are used in the code, and don’t
necessarily indicate the arguments to macros. The third mode is mostly
verbatim: it doesn’t recognize any macros at all. It is used in a few
macros like \alias{}
, where we might want to set an alias involving
backslashes, for instance. There are further complications in that
strings in quotes in R-like mode (e.g. "\n"
) are handled slightly
differently again, so the newline is not treated as a macro, and
R comments in R-like mode have their own special rules.
A good question to ask is why the syntax is so complicated. Largely this
is a legacy of the earlier versions of the help system. Because there
was no parsing, just a single-step transformation to the output format,
special characters were handled on a case-by-case basis, and not always
consistently. When writing the new parser, we wanted to minimize the
number of previously correct .Rd
files which triggered errors or were
rendered incorrectly. The three-mode syntax described above is the
result.
The syntax rules are fully described in Murdoch (2009), but most users should not need to know all the details. In almost all cases, the syntax is designed so that typing what you mean will give the result you want. A quick summary is as follows:
The characters \
, %
, {
and }
have special meaning almost
everywhere in an .Rd
file.
The backslash \
is used both to introduce macros and to escape the
special meaning of the other characters.
Unless escaped, the percent %
character starts a comment. The
comment is included in the parsed object, but the renderers will not
display it to the user.
Unless escaped, the braces {
and }
delimit the arguments to
macros. In R-like or verbatim text, they need not be escaped if they
balance.
End of line (newline) characters mark the end of pieces of text,
even when following a %
comment.
Other whitespace (spaces, tabs, etc.) is included as part of the text, though the renderers may remove or change it.
In R-like text, quoted strings follow R rules: the delimiters must
balance, and braces within need not. Only a few macros are
recognized in R strings: \var
and those related to links. Other
uses of a backslash, e.g. "\n"
are taken to be part of the string.
In R-like text, R comments using #
are taken to be part of the
text.
The directives #ifdef ... #endif
and #ifndef ... #endif
are
treated as markup, so other macros must be completely nested within
them or completely contain them.
There are more than 60 macros recognized by the parser, from \acronym
to \verb
. Each of them takes from 0 to 3 arguments in braces, and some
of them take optional arguments in brackets ([]
). A complete list is
given in Murdoch (2009), and their interpretation is described in the
Writing R Extensions manual (R Development Core Team 2009).
The output of the parse_Rd()
function is a list with class "Rd"
.
Currently this is mostly intended for internal use, and the only methods
defined for that class are as.character.Rd()
and print.Rd()
. The
elements of the list are the top level components of the .Rd
file.
Each element has an "RdTag"
attribute labelling it as one of the three
kinds of text, or a comment, or as a macro. There is a (currently
internal) function in the tools package to display all of the tags.
For example, using the help file from Figure 1, we get
the following results.
> library(tools)
> parsed <- parse_Rd("foo.Rd")
> tools:::RdTags(parsed)
1] "COMMENT" "TEXT"
[3] "\\name" "TEXT"
[5] "\\alias" "TEXT"
[7] "\\title" "TEXT"
[9] "\\description" "TEXT"
[11] "\\usage" "TEXT"
[13] "\\arguments" "TEXT"
[15] "\\seealso" "TEXT"
[17] "\\examples" "TEXT"
[19] "\\keyword" "TEXT" [
The TEXT
components between each section are simply the newlines that
separate them, and can be ignored. The components labelled with macro
names are themselves lists, with the same structure. For example,
component 15 is the \seealso
section, and it has this structure
> tools:::RdTags(parsed[[15]])
1] "TEXT" "TEXT" "\\code" "TEXT" [
and the \code
macro within it is a list, etc. Most users will have no
need to look at "Rd"
objects at this level, but this is crucial for
the internal quality assurance code, and for the renderers described in
the next section. As R 2.10.0 is the first release where these objects
are being fully used, there have been a number of changes since they
were introduced in R 2.9.0, and there may still be further changes in
future releases: so users of the internal structure should pay close
attention to changes taking place on the R development trunk.
One of the design goals in writing the new parser was to accept valid
.Rd
files from previous versions. This was mostly achieved, but there
are some incompatibilities arising from the new syntax rules given
above. A full description is included in Murdoch (2009); here we point out
some highlights.
In previous versions, the \code{}
macro was used for both R code and
code in other languages. However, the R code needs R-like parsing rules,
and other languages often need verbatim parsing. Since R code is more
commonly used, it was decided to restrict \code{}
to handle R code and
introduce \verb{}
for the rest. At present, this mainly affects text
containing quote marks, which must balance in a \code{}
block as in
R code. At some point in the future legal R syntax might be more
strictly enforced.
The handling of #ifdef
/#ifndef
and the rules for escaping characters
are now more regular.
One final feature of parse_Rd()
is worth mentioning. We make a great
effort to report errors in .Rd
files at the point they occur, and
because we now fully parse the file, we have detected a large number of
errors that previously went unnoticed. In most cases these errors
resulted in help pages that were not being displayed as the author
intended, but we don’t want to suddenly make hundreds of packages on
CRAN unusable. The compromise we reached is as follows: when the parser
detects an error in a .Rd
file, it reports an error or warning, and
attempts to recover, possibly skipping some text. The package
installation code will report these messages, but will not abort an
installation because of them. Authors should not ignore these messages:
in most cases, they indicate that something will not be displayed as
intended.
As of version 2.10.0, R will present help in three different formats:
text, HTML, and PDF manuals. Previous versions also included compiled
HTML on the Windows platform; that format is no longer supported, due
partly to security concerns (Microsoft Support 2009), partly to the fact that it
will not support dynamic help, and partly to reduce the support load on
the R Core team. The displayed type is now controlled by a single option
rather than a collection of options: for example, to see HTML help, use
options(help_type="html")
rather than options(htmlhelp=TRUE)
.
The three formats are produced by the functions Rd2txt()
, Rd2HTML()
and Rd2latex()
in the tools package. The main purpose of these
functions is to convert the parsed "Rd"
objects to the output format;
they also accept .Rd
files as input, calling parse_Rd()
if
necessary.
There are two other related functions in tools. The Rd2ex()
function
extracts example code from a help page, and checkRd()
performs checks
on the contents of an "Rd"
object. The latter checks for the presence
of all necessary sections (at most once in some cases, e.g. for the
\title{}
). It is used to report errors and warnings during
R CMD check
processing of a package.
One big change from earlier versions of R is that these functions do
their rendering on request. In earlier versions, all help processing was
done when the package was installed, and the text files, HTML files, and
PDF manuals were installed with the package. Now the "Rd"
objects are
produced at install time, but the human-readable displays are produced
at the time the user asks to see them. As described below, this means
that help pages can include information computed just before the page is
displayed. Not much use is made of this feature in 2.10.0, but the
capability is there, and we expect to make more use of it in later
releases of base R, and to see it being used in user-contributed
packages even sooner.
Another advantage of “just-in-time” rendering is that cross-package links between help pages are handled better. In previous versions of R, the syntax
\link[stats]{weighted.mean}
meant that the HTML renderer should link to the file named
weighted.mean.html
in the stats package, whereas
\link{weighted.mean}
meant to link to whatever file corresponded to the alias
weighted.mean
in the current package. This difference was necessary
because the installer needed to build links at install time, but it
could not necessarily look up topics outside of the current package (the
target package might not be installed yet). Unfortunately, the
inconsistency was a frequent source of errors, even in the base
packages. Now that calculation can be delayed until rendering time,
R 2.10.0 supports both kinds of linking. For back-compatibility, it
gives priority to linking by filename, but if that fails, it will try to
link by topic.
In order to produce the help display in a web browser when the user requests, it was necessary to add an HTTP server to R.
At a high level the goal of the server is to accept connections from browsers, convert each HTTP request on a TCP/IP port into a call to an R function and deliver the result back to the browser. Since the main use is to provide a dynamic help system based on the current state of the R session (such as packages loaded, methods defined, …), it is important that the function is evaluated in the current R session. This makes the implementation more tricky as we could not use regular socket connections and write the server entirely in R.
Instead, a general asynchronous server infrastructure had to be created
which handles multiple simultaneous connections and is yet able to
synchronize them into synchronous calls of the R evaluator. The
processing of each connection is dispatched to an asynchronous worker
which keeps track of the state of the connection, collects the entire
HTTP request and issues a message to R to process the request when
complete. The request is processed by calling the httpd
function with
two arguments: path from the URL and query parameters. The query
parameters are parsed from the query string into a named string vector
and decoded, so for example text=foo%3f&n=10
is passed as
c(text="foo?", n="10")
.
The httpd()
function is expected to produce output in the form of a
list that is meaningful to the browser. The list’s elements are
interpreted in sequence as follows: payload, content-type, headers
and status code. Only the payload is mandatory so the returned list
can have one to four elements. The content type specifies the MIME-type
of the payload (default is “text/html”), headers specify optional HTTP
response headers as a named character vector (without CR/LF) and the
error code is an integer specifying the HTTP status code. The payload
must be either a string vector of length one or a raw vector. If the
payload element is a string vector named “file” then the string is
interpreted as an absolute path to a file to be sent as the payload
(useful for static content). Otherwise the payload is sent to the
browser in verbatim. The httpd
function is evaluated in a tryCatch
block such that errors are returned as a string from the function,
resulting in a 500 status code (internal server error). An example of a
simple httpd
function is listed below:
<- function(path, query, ...) {
httpd if (is.null(query)) query <- character(0)
<- query['package']
pkg if (is.na(pkg)) pkg <- 'base'
<- system.file(path, package = pkg)
fn if (file.exists(fn)) return(list(file = fn))
list(paste("Cannot find", path),
"text/html", NULL, 404L)
}
The signature of the httpd
function should include … for future
extensions. The above function checks for the package
query parameter
(defaulting to “base”) then attempts to find a file given by path
in
that package. If successful the content of the file is served (as
default text/html), otherwise a simple error page is created with a 404
HTTP status code “not found”.
Although the HTTP server is quite general and could be used for many
purposes, R 2.10.0 has the limitation of one server per R instance which
makes it currently dedicated to the dynamic help. The user may set
options("help.ports")
to control which IP port is used by the server.
If not set, the port is assigned randomly to avoid collisions between
different R instances.
.Rd
FilesAs mentioned previously, help output is now being produced just before
display, and it is possible for the author to customize the display at
that time. This is supported by the new macro \``Sexpr{}
, which is
modelled after the macro of the same name in current versions of Sweave
(Leisch 2002).
Unlike Sweave, in .Rd
files the \``Sexpr{}
macro takes optional
arguments to control how it is displayed, and when it is calculated. For
example, \``Sexpr[results=verbatim,echo=TRUE]{x<-10;x^2}
will result
in a display similar to
> x<-10;x^2
1] 100 [
when the help page is displayed.
Some of the options are similar to Sweave, but not always with the same
defaults. For example the options (with defaults) eval=TRUE
,
echo=FALSE
, keep.source=TRUE
, and strip.white=TRUE
do more or less
the same things as in Sweave. That is, they control whether the code is
evaluated, echoed, reformatted, and stripped of white space before
display, respectively.
There are also options that are different from Sweave. The results
option has the following possible values:
(the default) The result should be inserted into the text at the current point.
Print the result as if it was executed at the console.
No result should be shown.
The result should be parsed as if it was .Rd
file code.
The stage
option controls when the code is run. It has the following
values:
The code should be run when building the package. This option is not currently implemented, but is allowed; the code will not be executed in R 2.10.0.
(the default) The code is run when installing the package from source, or when building a binary version of the package.
The code is run just before the page is rendered.
The \``Sexpr{}
macro enables a lot of improvements to the help system.
It will allow help pages to include examples with results inline, so
that they can be mixed with descriptions, making explanations clearer.
The just-in-time rendering will also allow help pages to produce
information customized to a particular session: for example, it would be
helpful to link from a page about a class to methods which mention it in
their signatures. It will also be possible to prototype new types of
summary pages or indices for the whole help system, without requiring
changes to base R.
The help files have supported #ifdef/#ifndef
conditionals for years,
to allow selective inclusion of text depending on the computing platform
where R is run. With version 2.10.0 more general conditional selection
has been added: the \if{}{}
and \ifelse{}{}{}
macros. The first
argument to these describes the condition, the second is code to be
included at render time if that condition holds, and the third if it
does not.
The most common use of the conditionals is foreseen to be to select
displays depending on the output format, to extend the idea of the
\eqn
and \deqn
macros from previous versions. To support this use,
the condition is expressed as a comma-separated list of formats, chosen
from example
, html
, latex
, text
, TRUE
and FALSE
. If the
current output format or TRUE
is in the list, the condition is
satisfied. The logicals would normally be the result of evaluating an
\``Sexpr{}
expression, so general run-time conditions are possible.
Two new macros have been added to support output of literal text. The
\verb{}
macro parses its argument as verbatim text, and the renderers
output it as parsed, inserting whatever escapes and conversions are
necessary so that it renders the same in all formats. The \out{}
macro
works at a lower level. It also parses its argument as verbatim text,
but the renderers output it in raw form, without any processing. It
would almost certainly be used in combination with \if
or \ifelse
.
For example, to output a Greek letter alpha, the markup
α}}{
\ifelse{html}{\out{ \eqn{\alpha}{alpha}}
could be used: this will output α
when producing HTML and
\alpha
when producing LaTeX, both of which will render as \(\alpha\); it
will output alpha
when producing text.
One of the development guidelines for R is that we don’t introduce many new features in patch releases, so R 2.10.1 is not likely to differ much from what is described above. However, we would expect small changes to the rendering of help pages as the renderers become more polished.
For version 2.11.0 or later, we would expect some of the following to be implemented. The order of presentation is not necessarily the order in which they will appear, and some may never appear.
The \``Sexpr{}
macro will likely develop in several ways:
We will likely add in the processing of stage=build
macros
when the R CMD build
code is rewritten in R.
We will likely change the prompt()
, package.skeleton()
and
related functions to make use of \``Sexpr{}
macros, e.g. to
display the version of a package, or to list the classes
supporting a method, etc.
We will make information about the run-time environment
available to the \``Sexpr{}
code, so that it can tailor its
behaviour to the context in which it is being run.
We would expect users to develop contributed packages to provide
convenient functions to use in \``Sexpr{}
macros. Some of
these may need changes to base R.
We would like to allow figures to be inserted into help displays.
The quality control checks will continue to evolve, as we notice
common errors and add code to checkRd()
to detect them. There is
already support for spell checking of .Rd
files in the aspell()
function.
The .Rd
format is very idiosyncratic, and not many tools exist
outside of R for working with it. There may be an advantage to
switching to a different input format (e.g. one based on XML) in
order to make use of more standard tools. Such a change would
require conversion of the thousands of existing help pages already
in .Rd
format; the new parser may enable automatic translation if
anyone wants to explore this possibility.
The HTTP server used in the help system is quite general, but is currently limited to serving R help pages. It may be extended to allow more general use in the future.
Changing to the new help system has already helped to diagnose hundreds
of errors in help pages that slipped by in the previous version, and
diagnostic messages are more informative than they were. As the
\``Sexpr{}
macro is used in core R and by package writers, we will see
better help than ever before, and the HTTP server will open up many
other possibilities for R developers.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Murdoch & Urbanek, "The New R Help System", The R Journal, 2009
BibTeX citation
@article{RJ-2009-2-r-help-system, author = {Murdoch, Duncan and Urbanek, Simon}, title = {The New R Help System}, journal = {The R Journal}, year = {2009}, note = {https://journal.r-project.org/news/RJ-2009-2-r-help-system}, volume = {1}, issue = {2}, issn = {2073-4859}, pages = {60-65} }