---
title: "From renv to containers: why recording your R packages may not be enough"
output:
  rmarkdown::html_vignette:
    css: styles.css
vignette: >
  %\VignetteIndexEntry{From renv to containers: why recording your R packages may not be enough}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
date: "Created 2026-04-30 | Last updated `r Sys.Date()`"
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  eval     = FALSE
)
#library(containr)
```

<img src="figures/logo.png" align="right" height="139" alt="toolero hex sticker"/>

## You already care about reproducibility

If you are reading this vignette, you probably already know why reproducibility
matters in research. You track your code with git. You use `renv` to record
which R packages your project depends on. You may even share your code with
collaborators or post it alongside a manuscript. You have, in other words,
already done more than most.

This vignette is for researchers who want to go one step further. Maybe you are
collaborating with someone whose machine behaves differently from yours. Maybe
you want to move an analysis from your laptop to a computing cluster and you
are not sure how to make sure it still runs. Maybe you have had the experience
of returning to an old project months later and finding that something no longer
works — a package updated, a dependency changed, and the analysis quietly broke.

The question this vignette tries to answer is: if `renv` already records your R
packages, what else could go wrong? And why might a container be the answer?

## What renv does — and does not do

`renv` is excellent at what it does. It records the R packages your project
uses — their names, versions, and sources — in a `renv.lock` file. When a
collaborator runs `renv::restore()`, they get the same package versions you
used. That is a genuine reproducibility guarantee, and it covers the most common
source of "it worked on my machine" problems.

But `renv` records the R package layer. Below that layer sits everything else
the analysis depends on: the R version itself, the operating system, the system
libraries that R packages compile against, and any external tools the analysis
calls. `renv` does not capture any of those. It cannot — that is not what it
was designed to do.

In practice this means that `renv::restore()` is not always sufficient. Consider
a few scenarios that R users encounter regularly.

A collaborator tries to restore your environment on a different operating
system. A package that compiled cleanly on macOS requires a system library that
is not installed by default on Linux. The restore fails, and the error message
points to a C compiler or a missing header file rather than anything obviously
R-related.

You install a newer version of R and try to rerun an analysis from six months
ago. Most packages restore fine, but one package was compiled against a system
library that has since been updated. The behavior changes subtly — or the
package simply fails to load.

You want to run your analysis on a computing cluster. The cluster runs Linux.
Your laptop runs macOS. Your `renv.lock` is intact, but the runtime environment
is fundamentally different. The analysis that runs cleanly on your machine may
not run at all on the cluster without additional setup.

None of these failures are caused by careless coding. They are caused by the
gap between what `renv` captures and what a running R analysis actually depends
on.

## What a container adds

A container is a lightweight, self-contained unit that packages an application
together with the environment it needs to run. For an R analysis, that means
not just the R packages but also the R version, the operating system libraries,
the system tools, and the configuration that ties everything together.

When you run a containerized analysis, you are not running it on your operating
system directly. You are running it inside a controlled environment that is
defined by a recipe — a `Dockerfile` — and that recipe can be shared, versioned,
archived, and run on any machine that can execute containers.

The key difference from `renv` is the level of the stack being captured. `renv`
records the R package layer. A container captures the entire runtime environment,
from the base operating system up through the R installation and package library.

It is worth being precise about what this means in practice. A container does
not guarantee that your analysis produces the same numerical results on every
machine — floating-point arithmetic and hardware differences can still introduce
variation at the margins. What it does guarantee is that the software environment
is identical: the same R version, the same package versions, the same system
libraries. That is a much stronger reproducibility guarantee than `renv` alone
can provide.

## The Dockerfile as a reproducibility artifact

The recipe for a container is a plain text file called a `Dockerfile`. It
specifies a base image — typically a Linux distribution with R pre-installed —
and then a sequence of instructions: install system libraries, install R
packages, copy files, set the working directory.

A `Dockerfile` is a reproducibility artifact in the same way that `renv.lock`
is. It can be committed to version control, shared with collaborators, archived
with a publication, and used to reconstruct the analysis environment at any
point in the future. Unlike `renv.lock`, which only records R packages, the
`Dockerfile` records the full stack.

Here is what a minimal `Dockerfile` for an R project might look like:

```dockerfile
FROM rocker/r-ver:4.4.0
ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    libcurl4-openssl-dev \
    libssl-dev \
    libxml2-dev \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /home
COPY renv.lock /home/renv.lock

RUN R -e "install.packages('renv', repos='https://packagemanager.posit.co/cran/latest')"
RUN R -e "renv::restore()"
```

Writing this by hand is tedious and error-prone. You have to know which system
libraries your R packages need, which base image to use, and how to structure
the installation steps. `containr` automates this: it reads your `renv.lock`,
infers the system library requirements, and writes the `Dockerfile` for you.

## Where containr fits

`containr` is not a containerization framework. It is a practical tool that
connects the R workflow researchers already use to the container workflow they
need when a project is ready to move beyond a single machine.

The connection point is `renv.lock`. If you already use `renv` — and if you
are reading this, you probably do — then `containr` needs almost nothing else
from you. It reads the lockfile, works out what the project needs at the system
level, and produces a `Dockerfile` you can build and push without leaving R.

In other words, `renv` and containers are not alternatives. They are
complementary layers of the same reproducibility stack. `renv` handles the R
package layer. A container handles everything below it. Using both together
gives you a reproducibility guarantee that neither can provide alone.

## What comes next

The companion vignette, *A first containerization workflow with containr*,
walks through the complete workflow: generating a `Dockerfile` from your
`renv.lock`, building the container image, listing local images, and pushing
the image to a registry. If you are ready to containerize a project, that is
the right place to start.

If you are not yet sure whether containerization is the right step for your
project, the framing in this vignette may be enough for now. You can return to
`containr` when the moment arrives — when you are preparing to share an
analysis, archive a workflow, or move an analysis to a computing cluster. The
`renv.lock` you already have is the starting point.