We describe the Rocker project, which provides a widely-used suite of Docker images with customized R environments for particular tasks. We discuss how this suite is organized, and how these tools can increase portability, scaling, reproducibility, and convenience of R users and developers.
The Rocker project was launched in October 2014 as a collaboration between the authors to provide high-quality Docker images containing the R environment (Boettiger and Eddelbuettel 2014). Since that time, the project has seen both considerable uptake in the community and substantial development and evolution. Here we seek to document the project’s objectives and uses.
Docker is a popular open-source tool to create, distribute, deploy, and run software applications using containers. Containers provide a virtual environment (see Clark et al. (2014) for an overview of common virtual environments) requiring all operating-system components an application needs to run. Docker containers are lightweight as they share the operating system kernel, start instantly using a layered filesystem which minimizes disk footprint and download time, are built on open standards that run on all major platforms (Linux, Mac, Windows), and provide an added layer of security by running an application in an isolated environment (Docker 2015). Familiarity with a few key terms is helpful in understanding this paper. The term “container” refers to an isolated software environment on a computer. R users can think of running a container as analogous to loading an R package; a container is an active instance of a static Docker image. A Docker “image” is a binary archive of that software, analogous to an R binary package: a given version is downloaded only once, and can then be “run” to create a container whenever it is needed. A “Dockerfile” is a recipe, the source-code, to create a Docker image. Pre-built Docker images are publicly available through Docker Hub, which plays a role for central distribution similar to CRAN in our analogy. Development and contributions to the Rocker project focus on the construction, organization and maintenance of these Dockerfiles.
Docker gives users very convenient access to pre-configured and pre-built binary images that “just work”. This allows R users to access a wider-variety of ready-to-use environments than provided by either the R Project itself or, say, their distribution which will generally focus on one (current) release. For example, R users on Windows may run RStudio Server or Shiny Server locally just by launching a single command (once Docker itself is installed). Another common use-case is access to R-devel without affecting the local system. Here, we detail some of the principal use cases motivating these containerized versions of R environments, and the design principles that help make them work.
One common use case for Rocker containers is to provide a fast and reliable mechanism to deploy a custom R environment to a remote server, such as Amazon Web Services Elastic Compute (AWS EC2), DigitalOcean, NSF’s Jetstream servers (Stewart et al. 2015), or private or institutional server hardware. Rocker containers are also easy to run locally on most modern laptops using Windows, MacOS, or Linux-based operating systems. By sharing volumes with the local host, users can still manipulate files with familiar, native tools while performing computation through a reproducible, containerized environhment (Boettiger 2015). Being able to test code in a predictable, pre-configured R environment on a local machine and to then run the same code in an identical environment on a remote server (e.g., for access to greater RAM, more processors, or merely to free up the local machine from a long-running computation) is essential for low-friction scaling of analysis. Without such containerization, getting code to run appropriately in a remote environment can be a major undertaking, requiring both time and knowledge many would-be users may not have.
For instance, on any platform with Docker installed, the following Docker command will launch a Rocker container providing the RStudio server environment over a web interface.
-qO- https://get.docker.com/ | sh
wget -p 8787:8787 -e PASSWORD=<PICK-A-PASSWORD> rocker/rstudio sudo docker run
The docker run
option -p
sets the port on which RStudio will appear,
for which 8787 is the default (adding your user to the docker group
avoids the need for a sudo
command to call docker
:
sudo usermod -g docker $USER
). Many academic and commercial cloud
providers make it possible to execute such code snippets when a
container is launched, without ever needing to ssh
into the machine.
The user may log into the server merely by pasting its IP address or DNS
name (followed by the chosen port, e.g., :8787
) into a browser and
entering the appropriate password. This provides the user with a
familiar, interactive environment running on a remote machine while
requiring a minimum of expertise.
This portability is also valuable in an instructional context. Requiring students to install all necessary software on personal laptops can be particularly challenging for short workshops, where download and installation time and troubleshooting across heterogeneous machines can prove time consuming and frustrating for students and instructors alike. By deploying a Rocker image or Rocker-derived image (see Extensibility) on a cloud machine, an instructor can easily provide all students access to the pre-configured software environment using only the browser on their laptops. This strategy has proven effective in our own experience in both workshops and semester-length courses. Similar Docker-based cloud deployments have been scaled to courses of 100s of students, e.g., at Duke (Cetinkaya-Rundel and Rundel 2017) and UC Berkeley (UC Berkeley 2017).
The portability of Rocker images can be particularly valuable in High Performance Computing contexts Setting up a specific R environment on High Performance Computing platforms and other centrally administrated multi-user machines or clusters has traditionally been challenging due to restrictions on root access that may be needed to install certain libraries. Versions of R and packages installed by the system administrator may also lag behind the most recent releases. Deploying Docker containers on HPC systems has previously been more very problematic since most system administrators do not want to allow the elevated user permissions the Docker runtime environment requires. To work around this problem, Lawrence Berkeley National Labs (LBNL) has made ‘Singularity’ (Lawrence Berkeley National Laboratories 2017): a container runtime environment that users can both install and use to run most Docker containers without requiring root privileges. Singularity has seen rapid adoption in the HPC community (http://singularity.lbl.gov/install-request). Rocker containers can be run through Singularity with a single command much like the native Docker commands, e.g.
://rocker/tidyverse:latest R singularity exec docker
More details can be found in the Singularity documentation.
An important aspect of the Rocker project design is the ability for
users to interact with the software on the container through either an
interactive shell session (such as the R shell or a bash shell), or
through a web browser accessing the RStudio(R) Server integrated
development environment (IDE). Traditional remote and high-performance
computing workflows for R users have usually required the use of ssh
and a terminal-only interface, posing a challenge for interactive
graphics and a barrier to users unfamiliar with these tools and
environments. Accessing an RStudio container through the browser removes
these barriers. Rocker images include the RStudio-server software
pre-installed and configured with the explicit permission of RStudio
Inc.
Users can access a bash
shell running as root within a Rocker
container using
-ti <container-id> bash docker exec
which can be useful for administrative tasks such as installing system dependencies. All Rocker images can also be run as an interactive R, RScript or bash shell without running RStudio, which can be useful for batch jobs or for anyone who prefers that environment.
As with any interactive Docker container, users should specify the
terminal (-t
) and interactive (-i
) flags, (here combined with
interactive as -ti
), and specify the desired executable environment
(e.g., R
, though other common options could be Rscript
or bash
):
--rm -ti rocker/tidyverse R docker run
This example shows the use of the --rm
flag to indicate that the
container should be removed when the interactive session is finished.
Details on sharing volumes, managing user permissions, and more can be
found on the Rocker website, https://rocker-project.org.
Another feature of Rocker containers is the ability to provide a
sandboxed environment, isolated from software and potentially from other
data on the machine. Many users are reluctant to upgrade their suite of
installed packages, which may break their existing code or even their R
environment if the installation goes poorly. However, upgrading packages
and/or the R environment is often necessary to run analyses from a
colleague, or access more recent methods. Rocker offers an easy
solution. For instance, a user can run R code requiring the most recent
versions of R and related packages inside a Rocker container without
having to upgrade their local installations first. Conversely, one could
use Rocker to run code on an older R release with prior versions of R
packages, again without having to make any alteration to one’s local R
install. Another common use case is to access a container with support
for particular options such as using gcc or clang compiler sanitizers
(Eddelbuettel 2014). These require R itself be built with specialized
settings that may not be not available or familiar to many R users on
their native system, but can be easily deployed by pulling the Rocker
images rocker/r-devel-san
or rocker/r-devel-ubsan-clang
.
This sandboxing feature is also valuable in the remote computing context, allowing system administrators to grant users freedom to install software which requires root privileges inside a container, while not granting them root access on the host machine. Root access is required to launch Docker containers, though not to access containers already running and providing some service such as RStudio. Users logging into a container through the RStudio interface do not by default have root privileges, though are able to install R packages. Granting these users root privileges in the container still leaves them sandboxed from the host container. Sandboxing also serves an important function in reproducible research by making it easier to test a specified environment in isolation from the host machine. Unlike traditional virtual machines, containers do not impose a large footprint of reserved resources as a typical host can easily support 100s of containers (Docker 2015).
Users can easily determine the software stack installed on any Rocker image by examining the associated Dockerfile recipe, which provides a concise, human-readable record of the installation. All Rocker images use automated builds through Docker Hub, which also acts as the central, default repository distributing the images. Using automated builds rather than uploading pre-built image binaries to Docker Hub avoids the potential for the build not to match the recipe. The corresponding Dockerfile is visible both on the Docker Hub and in the linked GitHub repository, which provides a transparent versioned history of all changes made to these recipes, as well as documentation, a community wiki, and issue trackers for discussing proposed changes, bugs, improvements to the Dockerfiles and troubleshoot any issues users may encounter. Having these public source files built automatically by a trusted provider (Docker Hub), rather than built locally and uploaded as binaries, is also useful from a security perspective in avoiding malware.
Having a shared, transparent computational environment created by a
publicly hosted, reproducible recipe facilitates community input into
configuration details. R and many of its packages and related software
can be configured with a wide range of options, compilers, different
linear-algebra libraries and so forth. While this flexibility reflects
varying needs, many users rely on default settings which are most often
are optimized more for simplicity of installation rather than than
performance. The Rocker recipes reflect significant community input on
these choices. This helps create a more finely tuned, optimized
reference implementation of the R environment as well as a platform for
comparing and discussing these concerns which are often overlooked
elsewhere. Issues and Pull Requests on the Rocker repositories on GitHub
attest to some of these discussions and improvements. In particular,
input from the Docker Inc. employees through the official approval
process for the r-base
image, expertise from the Debian R maintainer
and other Debian developers, and both direct and indirect feedback from
the experience and user-generated documentation from many early adopters
in the R community has helped shape and strengthen the project over the
past few years. Widespread use of the Rocker image helps promote both
testing of these choices and contributions, further tweaking the
configuration from many members of the R community.
Access to specific versions of software can be important for users who
need computational reproducibility more than having the latest release
of any piece of software, since subsequent releases can alter the
behavior of code, introduce errors or otherwise alter previous results.
The versioned stack (r-ver
, rstudio
, tidyverse
, verse
, and
geospatial
) provides images which are intended to build an identical
software stack every time, regardless of the release of new libraries
and packages. Users should specify an R version tag in the Docker image
name to request a version stable image, e.g., rocker/verse:3.4.0
. If
no tag is explicitly requested, Docker will provide the image with the
tag :latest
, which will always have the latest available versions of
the software (built nightly).
Users building on the version-tagged images will by default use the MRAN
snapshot mirror (Revolution Analytics 2017) associated with the most recent date for which
that image was current. This ensures that a Dockerfile building
FROM rocker/verse:3.4.1
will only install R package versions that were
available on CRAN on 2017-06-30, i.e., the day R 3.4.1 was released.
This default can of course be overwritten in the standard R manner,
e.g., by specifying a different CRAN mirror explicitly in any command
to install packages, e.g., install.packages()
, or by adjusting the
default CRAN mirror in options(repo=<CRAN-MIRROR>)
in an .Rprofile
.
Note that the MRAN date associated with the current release (e.g.,
3.4.2
at the time of writing) will continue to advance on the
Docker-hub image until the next R release. Software installed from
apt-get
in these images will come from the the stable Debian release
(stretch
or jessie
) and thus not change versions (though it will
receive security patches). Packages installed from BioConductor using
the bioclite()
utility will also install the version appropriate to
the version of R found on the system (the Bioconductor semi-annual
release model avoids the need for an MRAN mirror). Users installing
packages from GitHub or other sources can request a specific git release
tag or hash for a more reproducible build, or adopt an alternative
approach such as packrat
(Ushey et al. 2016). A more general discussion of the use and limitations of
Docker for computational reproducibility can be found in
(Boettiger 2015).
Any portable computational environment faces an inevitable tension between the “kitchen sink problem” at one extreme, and the “discovery problem” on the other. A kitchen sink image seeks to accommodate too many use cases in a single image. Such images are inevitably very large and thus slow or difficult to deploy, maintain and optimize. At the other extreme, providing too many specialized images makes it more difficult for a user to discover the one they need. The Rocker project seeks to avoid both of these problems by providing a carefully-curated suite of images that an be easily extended by individuals and communities.
To make extensions transparent and persistent, Rocker images can be
extended by any user by writing their own Dockerfiles based on an
appropriate Rocker image. The Dockerfiles in the Rocker stack should
themselves provide a simple example of this, (as described in the
following section). A user begins by selecting an appropriate base image
for their needs: if the RStudio interface is desired, a user might start
with FROM rocker/rstudio
; an image for testing an R package with
compiled code might use FROM rocker/r-devel-san
, and an image for
reproducing a data analysis will probably select a stable version tag in
addition to an appropriate base library, e.g.,:
FROM rocker/tidyverse:3.4.1
. Users can easily add additional software
to any running Rocker image using the standard R and Debian mechanisms.
Details on how to extend Rocker images can be found at
https://rocker-project.org.
Sharing these Dockerfiles can also facilitate the emergence of
extensions tuned to particular communities. For instance, the
rocker/geospatial
image emerged from the input of a number of Rocker
users all adding common geospatial libraries and packages on top of the
existing Rocker images. This coalescence helped create a more fine-tuned
image with broad support for a wide range of commonly-used data formats
and libraries. Other community images are developed and maintained
independently of the Rocker project, such as the popgen
image of
population-genetics-oriented software developed by the National
Evolutionary Synthesis Center (NESCent). Rocker images are also being
used as base Docker images in the NSF sponsored Whole Tale project for
reproducible computing (Ludaescher et al. 2017), and are heavily used by the
rhub project in automated
package testing (Csárdi 2017).
The Rocker project consists of a suite of images built automatically by
and hosted on the Docker Hub, https://hub.docker.com/r/rocker. Source
Dockerfiles, supporting scripts and documentation are hosted on GitHub
under the organization rocker-org
, https://github.com/rocker-org.
The issue tracker and pull requests are used for community input,
discussions, and contributions to these images. The Rocker project wiki,
https://github.com/rocker-org/rocker/wiki, provides a place to
synthesize community-contributed documentation, use-cases, and other
knowledge about using the Rocker images.
The Rocker project aims to provide a small core of Docker images that serve as convenient ‘base’ images on which other users can build custom R environments by writing their own Dockerfiles, while also providing a ‘batteries included’ approach to images that can be used out of the box. The challenges of balancing diverse needs driven by very different use cases against the overarching goals of creating images that are still sufficiently light-weight, easy to use, and easy to maintain is a difficult art. The implementation in both individual Rocker images and image stacks can never perfect that balance for everyone, but today reflects the considerable community input and testing over the past few years.
All Rocker images are based on the Debian Linux distribution. It
provides a small base image, the well-known apt
package management
system, and a rich ecosystem of software libraries, making it the base
image of choice for Docker images, including many of the “official”
images maintained by Docker’s own development team. The Debian platform
is also perhaps the best-supported Linux platform within the R
community, including an active r-sig-debian
listserve. The relatively
long period between stable Debian releases (roughly two years recently)
means that software in the Debian stable (e.g., debian:jessie
,
debian:stretch
) releases can lag significantly behind current releases
of popular software, including R. More recent versions of packages can
be found in the pre-release distribution, debian:testing
, while the
very latest binary builds can be found on debian:unstable
. The Rocker
project can be largely divided into two stacks which address different
needs, reflected in which Debian distribution they are based on. The
first stack is based on debian:testing
. The second, more
recently-introduced stack, is based only on Debian stable releases.
Rocker images always point to specific stable releases (jessie
,
stretch
), and do not use the tag debian:stable
, which is a rolling
tag that always points to the most recent stable version. The different
Rocker stacks have different aims and thus provide different images, as
shown in Tables 1 & 2 below.
debian:testing
-based imagesThe debian:testing
stack aims to make the most efficient use of
upstream builds: the pre-compiled .deb
binaries provided by the Debian
repositories. It is both quicker and easier to install software from
binaries, since the package manager (apt
) manages the necessary
(binary) dependencies and bypasses the time-consuming process of
compiling from source. Basing this stack on debian:testing
means that
much more recent versions of commonly-used libraries and compilers are
available as binaries than would be found in a Debian stable release. In
order to provide optional access to the most recent available
binaries, this stack uses apt-pinning (Debian Project 2017) to allow the apt
package manager to selectively install binaries from
debian:unstable
, which represents the most recent set of packages
built for Debian. Similarly, recent versions of many popular R packages
can also be installed pre-built through the package manager, e.g.,
apt-get install r-cran-xml
. This can be particularly helpful for
packages with external system dependencies (such as libxml2-dev
in
this example) which cannot be installed from the R console as they are
system dependencies rather than R packages installed from within R. We
should note, however, that only about 500 of the over 11,000 CRAN
packages are available as Debian packages.
As the names testing
and unstable
imply, particular versions of
package can change as packages move from unstable
into testing
. New
versions are sent to unstable
during the normal course of Debian
development. This can occasionally break a previously-working
installation command in a Dockerfile until the maintainer redirects the
package manager to install a package from the unstable
sources that
could previously be installed from testing
, or vice versa (using the
-t
option in apt
). That said, packages only migrate from unstable
to testing
after a period of several days—and if the migration and
installation of the particular version is free of interactions with
other packages in their dependency graph. That way, unstable
serves as
validation lab which leaves testing
reasonably stable yet current.
Relative to stable
, the testing
stack thus offers some advantages as
almost all software can be installed through the package manager.
Installation of binary packages from testing
generally provides the
most recent available software, and installs it quickly as a binary. On
the other hand, these Dockerfiles may require occasional maintenance
when packages migrate and/or versions change. The resulting images are
also inherently dynamic: rebuilding the same Dockerfile months or years
apart will generate images with significantly different versions of
software installed as the pool of underlying packages changes through
time.
The debian:testing
-based stack currently includes seven images
actively maintained by the Rocker development team (Table 1). r-base
builds on debian:testing
, and the other six in the stack each build
directly from r-base
. The r-base
image is unique in that it is
designated as the official image for the R language by the Docker
organization itself. This official image is reviewed and then built by
employees of Docker Inc. based on a Dockerfile maintained by the Rocker
team. Consequently, users should refer to this image in Docker commands
without an organization namespace, e.g., docker run -ti r-base
to
access the official image. All other images in the Rocker project are
not individually reviewed and built by Docker Inc. and must be
referenced using the rocker
namespace, e.g.,
docker run -ti rocker/r-devel
.
Several of the images in this stack are oriented towards the R
development community: r-devel
, drd
, r-devel-san
, and
r-devel-ubsan-clang
which all add a copy of the development version of
R side-by-side to the current release of R provided by r-base
. On
these images, the development version is aliased to RD
to distinguish
from the current release, R
. As the names suggest, each provide
slightly different configurations. Of particular interest are the images
providing development R built with support for C/C++ address and
undefined-behavior sanitizers, which are somewhat difficult to configure
(Eddelbuettel 2014).
As these images focus on developers and/or as base images for custom
uses, this stack does not include many specific R packages. Additional
dependencies and packages can easily be installed from apt
. R packages
not available in the apt
repositories can be installed directly from
CRAN using either R
or the littler
scripts, as described in
https://rocker-project.org/use.
This stack also includes the images shiny
and rstudio:testing
that
provide Shiny server and RStudio server IDE from RStudio Inc, built on
the r-base
image. RStudio and Shiny are registered trademarks of
RStudio Inc, and their use and the distribution of their software in
binary form on Docker Hub has been granted to the Rocker project by
explicit permission from RStudio. Users should review RStudio’s
trademark use policy (http://www.rstudio.com/about/trademark/) and
address inquiries about further distribution or other questions to
permissions@rstudio.com
. The Rocker
project also provides images with RStudio server and Shiny server in the
stable versioned stack.
Build schedule: The official r-base
image is rebuilt by Docker
following any updates to the official debian
images (roughly every few
weeks). The rest of the stack uses build triggers that rebuild the
images whenever r-base
is updated or the Dockerfile sources are
updated on the corresponding GitHub repository. The only exception in
this stack is the drd
image, which is rebuilt each week by a cron
trigger.
image | description | size | downloads |
r-base | official image with current version of R | 254 MB | 632,000 |
r-devel | R-devel added side-by-side to r-base (using alias RD ) |
1 GB | 4,000 |
drd | lightweight r-devel, built weekly | 571 MB | 4,000 |
r-devel-san | as r-devel, but built with compiler sanitizers | 1.1 GB | 1,000 |
r-devel-ubsan-clang | sanitizers, clang c compiler (instead of gcc) | 1.1 GB | 525 |
rstudio:testing | rstudio on debian:testing | 1.1 GB | 1,000 |
shiny | shiny-server on r-base | 409 MB | 123,000 |
image | description | size | downloads |
r-ver | version-stable base R & src build tools | 219 MB | 6,000 |
rstudio | adds rstudio | 334 MB | 314,000 |
tidyverse | adds tidyverse & devtools | 656 MB | 83,000 1 |
verse | adds java, tex & publishing-related packages | 947 MB | 9,000 |
geospatial | adds geospatial libraries | 1.3 GB | 4,000 |
debian:stable
-based stackThis stack emphasizes stability and reproducibility of the Docker build.
This stack was introduced much more recently (November 2016) in response
to considerable user input and requests. The key feature of this stack
is the ability to run older versions of R along with the
then-contemporaneous versions of R packages. A user specifies the
version desired using an image tag, e.g., rocker/r-ver:3.3.1
will
refer to an image with R version 3.3.1 installed. Omitting the tag is
equivalent to using the tag latest
, which, as the name implies, will
always point to an image using the current R release. Thus, users who
want to create downstream Dockerfiles, which are based on the current
release at the time (but will continue to reconstruct the same
environment in the future after newer R versions are released), should
explicitly include the corresponding version tag, e.g.,
rocker/r-ver:3.4.2
at the time of writing, and not the latest
tag.
Users can also run the current development version of R using the tag
devel
, which is built nightly from R-devel sources from subversion
.
MRAN archives: To facilitate installation of only contemporaneous
versions of R packages on these images, the default CRAN mirror from
which to install R packages is fixed to a snapshot of CRAN corresponding
to the last date for which that version of R was current (e.g.,
3.4.2
was released on 2017-09-28, thus 3.4.1
is pinned to the MRAN
snapshot for that date). These snapshots are provided by the MRAN
archive created by Revolution Analytics (now part of Microsoft). It
archives daily snapshots of all of CRAN from which a user can install
packages with the usual install.packages()
function (Revolution Analytics 2017). Users can
always override this default by passing any current CRAN repository
explicitly. Unlike CRAN, Bioconductor only updates its repositories
through bi-annual releases aligned to R’s spring release schedule. Thus,
Bioconductor packages can be installed in the usual way using
bioclite
, which automatically selects the Bioconductor release
corresponding to the version of R in use.
Version tags: The version tags are propagated throughout this stack:
e.g., rocker/tidyverse:devel
will provide the currently-released
versions of the R packages in the
tidyverse
(Wickham 2017) installed on the nightly build of R-devel.
Developers building packages on this stack are encouraged to tag their
images accordingly as well. Table 3 indicates which versions of R are
currently available in the stack, going back to 3.1.0
. While older
versions may be added to the stack at a later date, we note that the
MRAN snapshots began in 2014-09-17 and thus go back only to the R 3.1
era. Each tag must be built from a separate Dockerfile, enabling minor
differences in the build instructions to accommodate changing
dependencies. Dockerfiles for past versions (e.g., prior to 3.4.2
currently) are intended to remain static over the long term, while the
tag for the current version, latest
, and devel
may be tweaked to
accommodate new features or dependencies. Version tags also obey
semantics so that omitting the second or third position of the tag is
identical to asking for the most recent version: i.e.,
rocker/verse:3.3
is the same as rocker/verse:3.3.3
, and
rocker/verse:3
is (at the time of writing), rocker/verse:3.4.2
. This
is accomplished using post-build hooks in Docker Hub—see examples at
https://github.com/rocker-org/rocker-versioned/ for details.
Installation: In this stack, the desired version of R is always
built directly from source rather than the apt
repositories. Compilers
and dependencies are still installed from the stable apt
repositories,
and thus lag behind the more recent versions found in the testing
stack. Version tags 3.3.3
and older are based on the Debian 8.0
release, code-named jessie
, while 3.4.0
- 3.4.2
, devel
, and
latest
are based on Debian 9.0, stretch
, (released 2017-06-17, while
R was at 3.4.0
), and thus have access to much newer versions of common
system dependencies and compilers. Dependencies needed to compile R that
are not required at runtime are removed once R is installed, keeping the
base images light-weight for faster download times. While most system
dependencies required by common R packages can still be installed from
the apt
repositories, occasionally a more recent version must be
compiled from source (e.g., the Gibbs Sampling program JAGS (Plummer 2017),
and the geospatial toolkit GDAL, must both be compiled from source on
debian:jessie
images). In this stack, users should avoid installing R
packages using apt
without careful consideration as this will install
a second (probably different) version of R from the Debian repositories,
and a dated version of the R package since any r-cran-pkgname
package
in the Debian repositories will depend on r-base
in apt
as well.
Build schedule: All images are built automatically from their
corresponding Dockerfiles (found in the GitHub repositories
rocker-org/rocker-versioned
and rocker-org/geospatial
). A cron
job
sends nightly build triggers to Docker Hub to rebuild the latest
and
devel
tagged images throughout the stack. To decrease load on the hub,
build triggers for the numeric version tags are sent monthly. Although
the Dockerfiles for older R versions install an almost-identical
software environment every time, the monthly rebuilding of these images
on Docker Hub ensures they continue to receive Debian security updates
from upstream, and proves the build recipe still executes successfully.
Note that rebuilding images with software from external repositories
never produces a bit-wise identical image, and thus the image identifier
hash will change at each build.
In this stack, each image builds on the previous image, rather than all
other images building directly on the base image, as in the testing
stack. Table 2 lists the names and descriptions of the five images in
this stack, along with image size and approximate download counts from
Docker Hub. Sizes reflect (compressed) cumulative size: a user who has
already downloaded the most recent version of r-ver
and then pulls a
copy of rstudio
image will only need to download the additional 115 MB
in the rstudio
layers and not the full 334 MB listed. This linear
design limits flexibility (no option for tidyverse
without rstudio
)
but simplifies use and maintenance. While no single environment will be
optimal for everyone, both the packages selected in this stack and the
stack ordering reflect considerable community input and tuning.
The rstudio
image includes a lightweight, easy-to-use and
docker-friendly init
system, s6 (Bercot 2017) for running persistent services,
including the RStudio server. This system provides a convenient way for
downstream Dockerfile developers to add additional persistent services
(such as an ssh
server) to a single container, or additional start-up
or shutdown scripts that should be run when a container starts up or
shuts down. The rstudio
image uses such a start-up script to configure
user settings such as login password and permissions through
environmental variables at run time.
The tidyverse
image contains all required and suggested dependencies
of the commonly-used tidyverse
and devtools
R packages, including
external database libraries (e.g., MariaDB and PostgreSQL). Users
should consult the package Dockerfiles or installed.packages()
list
directly for a complete list of installed packages. The verse
library
adds commonly-used dependencies, notably a large but not comprehensive
LaTeX environment and Java development libraries. Previously, the Rocker
project provided the image hadleyverse
which has since been divided
into tidyverse
and verse
based on community input.
tag | apt repos | MRAN date | Build frequency | images with tag |
devel | stretch |
current date | nightly | r-ver , rstudio , tidyverse , verse ,geospatial |
latest | stretch |
current date | nightly | r-ver , rstudio , tidyverse , verse ,geospatial |
3.4.2 | stretch |
current date | monthly | r-ver , rstudio , tidyverse , verse ,geospatial |
3.4.1 | stretch |
2017-09-28 | monthly | r-ver , rstudio , tidyverse , verse ,geospatial |
3.4.0 | stretch |
2017-06-30 | monthly | r-ver , rstudio , tidyverse , verse ,geospatial |
3.3.3 | jessie |
2017-04-21 | monthly | r-ver , rstudio , tidyverse , verse ,geospatial |
3.3.2 | jessie |
2017-03-06 | monthly | r-ver , rstudio , tidyverse , verse ,geospatial |
3.3.1 | jessie |
2016-10-31 | monthly | r-ver , rstudio , tidyverse , verse ,geospatial |
3.3.0 | jessie |
2016-06-21 | monthly | r-ver |
3.2.0 | jessie |
2015-06-18 | monthly | r-ver |
3.1.0 | jessie |
2014-09-17 | monthly | r-ver |
Several images in the rocker-versioned
stack can be customized on
build when built locally (rather than pulling prebuilt images from
Docker Hub) by using the --build-arg
option of docker build
. In the
r-ver
image, users can set R_VERSION
and BUILD_DATE
(MRAN default
snapshot). In the rstudio
image users can set RSTUDIO_VERSION
(otherwise defaults to the most recent version), and the
PANDOC_TEMPLATES_VERSION
.
This stack also makes use of Docker metadata labels defined by
http://schema-label.org, indicating image license
(GPL-2.0),
vcs-url
(GitHub repository), and vendor
(Rocker Project). These
metadata can be altered or extended in downstream images.
Over the past several years, Docker has seen immense adoption across industry and academia. The Open Container initiative (The Linux Foundation: Projects 2017) now provides an open standard that has further extended this container approach to research environments through projects such as Singularity (Lawrence Berkeley National Laboratories 2017), allowing users to deploy containerized environments such as Rocker on machines where they do not have root access, such as clusters or private servers. Containerization promises to solve numerous challenges such as portability and replicability in research computing, which often relies on complex and heterogeneous software stacks (Boettiger 2015). Yet implementing such environments in containers is not a trivial task, and not all implementations provide the same usability, portability or reproducibility. Here we have detailed the approach taken by the Rocker project in creating and maintaining these environments through an open and community-driven process. This structure of the Rocker project has evolved over three years of operation while drawing in an ever-widening base of academic researchers, university instructors and industry users. We believe this overview will be instructive not only to users and developers interested in the Rocker project, but as a model for similar efforts around other environments or domains.
ReproducibleResearch, WebTechnologies
This article is converted from a Legacy LaTeX article using the texor package. The pdf version is the official version. To report a problem with the html, refer to CONTRIBUTE on the R Journal homepage.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Boettiger & Eddelbuettel, "An Introduction to Rocker: Docker Containers for R", The R Journal, 2017
BibTeX citation
@article{RJ-2017-065, author = {Boettiger, Carl and Eddelbuettel, Dirk}, title = {An Introduction to Rocker: Docker Containers for R}, journal = {The R Journal}, year = {2017}, note = {https://rjournal.github.io/}, volume = {9}, issue = {2}, issn = {2073-4859}, pages = {527-536} }