--- title: "From renv to containers: why recording your R packages may not be enough" output: rmarkdown::html_vignette: css: styles.css vignette: > %\VignetteIndexEntry{From renv to containers: why recording your R packages may not be enough} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} date: "Created 2026-04-30 | Last updated `r Sys.Date()`" --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) #library(containr) ``` toolero hex sticker ## You already care about reproducibility If you are reading this vignette, you probably already know why reproducibility matters in research. You track your code with git. You use `renv` to record which R packages your project depends on. You may even share your code with collaborators or post it alongside a manuscript. You have, in other words, already done more than most. This vignette is for researchers who want to go one step further. Maybe you are collaborating with someone whose machine behaves differently from yours. Maybe you want to move an analysis from your laptop to a computing cluster and you are not sure how to make sure it still runs. Maybe you have had the experience of returning to an old project months later and finding that something no longer works — a package updated, a dependency changed, and the analysis quietly broke. The question this vignette tries to answer is: if `renv` already records your R packages, what else could go wrong? And why might a container be the answer? ## What renv does — and does not do `renv` is excellent at what it does. It records the R packages your project uses — their names, versions, and sources — in a `renv.lock` file. When a collaborator runs `renv::restore()`, they get the same package versions you used. That is a genuine reproducibility guarantee, and it covers the most common source of "it worked on my machine" problems. But `renv` records the R package layer. Below that layer sits everything else the analysis depends on: the R version itself, the operating system, the system libraries that R packages compile against, and any external tools the analysis calls. `renv` does not capture any of those. It cannot — that is not what it was designed to do. In practice this means that `renv::restore()` is not always sufficient. Consider a few scenarios that R users encounter regularly. A collaborator tries to restore your environment on a different operating system. A package that compiled cleanly on macOS requires a system library that is not installed by default on Linux. The restore fails, and the error message points to a C compiler or a missing header file rather than anything obviously R-related. You install a newer version of R and try to rerun an analysis from six months ago. Most packages restore fine, but one package was compiled against a system library that has since been updated. The behavior changes subtly — or the package simply fails to load. You want to run your analysis on a computing cluster. The cluster runs Linux. Your laptop runs macOS. Your `renv.lock` is intact, but the runtime environment is fundamentally different. The analysis that runs cleanly on your machine may not run at all on the cluster without additional setup. None of these failures are caused by careless coding. They are caused by the gap between what `renv` captures and what a running R analysis actually depends on. ## What a container adds A container is a lightweight, self-contained unit that packages an application together with the environment it needs to run. For an R analysis, that means not just the R packages but also the R version, the operating system libraries, the system tools, and the configuration that ties everything together. When you run a containerized analysis, you are not running it on your operating system directly. You are running it inside a controlled environment that is defined by a recipe — a `Dockerfile` — and that recipe can be shared, versioned, archived, and run on any machine that can execute containers. The key difference from `renv` is the level of the stack being captured. `renv` records the R package layer. A container captures the entire runtime environment, from the base operating system up through the R installation and package library. It is worth being precise about what this means in practice. A container does not guarantee that your analysis produces the same numerical results on every machine — floating-point arithmetic and hardware differences can still introduce variation at the margins. What it does guarantee is that the software environment is identical: the same R version, the same package versions, the same system libraries. That is a much stronger reproducibility guarantee than `renv` alone can provide. ## The Dockerfile as a reproducibility artifact The recipe for a container is a plain text file called a `Dockerfile`. It specifies a base image — typically a Linux distribution with R pre-installed — and then a sequence of instructions: install system libraries, install R packages, copy files, set the working directory. A `Dockerfile` is a reproducibility artifact in the same way that `renv.lock` is. It can be committed to version control, shared with collaborators, archived with a publication, and used to reconstruct the analysis environment at any point in the future. Unlike `renv.lock`, which only records R packages, the `Dockerfile` records the full stack. Here is what a minimal `Dockerfile` for an R project might look like: ```dockerfile FROM rocker/r-ver:4.4.0 ENV DEBIAN_FRONTEND=noninteractive RUN apt-get update && apt-get install -y \ libcurl4-openssl-dev \ libssl-dev \ libxml2-dev \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* WORKDIR /home COPY renv.lock /home/renv.lock RUN R -e "install.packages('renv', repos='https://packagemanager.posit.co/cran/latest')" RUN R -e "renv::restore()" ``` Writing this by hand is tedious and error-prone. You have to know which system libraries your R packages need, which base image to use, and how to structure the installation steps. `containr` automates this: it reads your `renv.lock`, infers the system library requirements, and writes the `Dockerfile` for you. ## Where containr fits `containr` is not a containerization framework. It is a practical tool that connects the R workflow researchers already use to the container workflow they need when a project is ready to move beyond a single machine. The connection point is `renv.lock`. If you already use `renv` — and if you are reading this, you probably do — then `containr` needs almost nothing else from you. It reads the lockfile, works out what the project needs at the system level, and produces a `Dockerfile` you can build and push without leaving R. In other words, `renv` and containers are not alternatives. They are complementary layers of the same reproducibility stack. `renv` handles the R package layer. A container handles everything below it. Using both together gives you a reproducibility guarantee that neither can provide alone. ## What comes next The companion vignette, *A first containerization workflow with containr*, walks through the complete workflow: generating a `Dockerfile` from your `renv.lock`, building the container image, listing local images, and pushing the image to a registry. If you are ready to containerize a project, that is the right place to start. If you are not yet sure whether containerization is the right step for your project, the framing in this vignette may be enough for now. You can return to `containr` when the moment arrives — when you are preparing to share an analysis, archive a workflow, or move an analysis to a computing cluster. The `renv.lock` you already have is the starting point.