How we use renv to be in two places at once
At the Data Lab, our science team has a practice where an individual team member shares something that they recently figured out (or didn’t totally figure out yet) on a biweekly basis. We call this short 5-10 minute presentation How I Solved This, and it’s a great way to formally share (often hard-won) knowledge with each other. In this post, we thought we’d share how we solved something with the [.inline-snippet]renv[.inline-snippet] package with you.
The problem: managing software dependencies for training workshops
Our training workshops are among our major projects at the Data Lab. These workshops are short courses that introduce pediatric cancer researchers to things like the R programming language and quantifying RNA-seq data. A major challenge to leading these workshops, where time is of the essence, is dealing with software dependencies that need to be installed and running correctly before we get into the material. If training participants’ computing environments are not set up, it could easily take hours to get everyone ready to start. We also try to prioritize reproducibility, and therefore consistent computing environments, in most of our work.
There are lots of ways to approach managing dependencies for short courses, and at least as many philosophies about why one way is the “right” approach, but I’ll detail our approach here.
A tale of two solutions for the same problem
When we pivoted to virtual workshops in early 2020, we started using RStudio Server. With our RStudio Server, we could set up all of the software required before anyone even logged into a training session! After participants log in via their browser, we’re ready for hands-on instruction in a matter of minutes.
But, sometimes, for whatever reason, a participant needs to work outside of our RStudio Server. And, sometimes, our workshop developers need a solution outside the server, too, like when we run GitHub actions to render the R Notebooks we use to teach. To address these cases, we maintain Docker images – based on images from the Rocker Project – that have the same set of packages and dependencies installed as the server.
And that’s when the headache emerged: we had a list of R packages that had to be installed in different environments, with different methods (a shell script for installation on the server and a Dockerfile, respectively). We want to make sure we’re using the exact same versions of packages (sometimes down to the commit hash depending on where we’re installing from!), and the environments are just different enough to make using a single shell script in both places infeasible. Keeping things consistent in two places can be more than twice as hard (and we aim to work smarter, not harder)!
How we solved it
This is where [.inline-snippet]renv[.inline-snippet] comes in! All the features and foibles of [.inline-snippet]renv[.inline-snippet] are beyond the scope of this post, but we’ll cover the basics as needed to describe our solution.
[.inline-snippet]renv[.inline-snippet], pronounced “R env” for R environment, is an R package that allows users to capture the state of the packages needed for a project in something called a lockfile. (Something we have longed for in the past!) You can see our lockfile for our training repository here. [.inline-snippet]renv[.inline-snippet] also includes a function called [.inline-snippet]renv::restore()[.inline-snippet] that allows you to use that lockfile to install packages when they are not currently installed. This is exactly the functionality we leverage to keep things consistent across our Docker image and RStudio Server.
If we want to use the lockfile and [.inline-snippet]renv::restore()[.inline-snippet] for our training Docker image – largely by following the [.inline-snippet]renv[.inline-snippet] instructions for using with Docker – we can include the following instructions in our Dockerfile:
This copies over our lockfile from our repository after installing [.inline-snippet]renv[.inline-snippet], uses the lockfile to install packages on the image, and removes the [.inline-snippet]renv[.inline-snippet] directory for some savings on the image size. [.inline-snippet]renv::consent(provided = TRUE)[.inline-snippet] lets [.inline-snippet]renv[.inline-snippet] write and update files in a non-interactive session like when you build a Docker image.
To install these packages on the server, we can basically follow similar steps in the shell script we use, but we snag the lockfile from GitHub instead.
First, we need to specify the version of R because the server can support multiple versions.
Then, we set the CRAN mirror; here we’re using RStudio Package Manager.
Just like in the Dockerfile, we have to specify the version of the [.inline-snippet]renv[.inline-snippet] package we want to use.
And we’ll want to grab the most recent version of the lockfile from GitHub.
To install the R packages, we use a heredoc to pass all the required steps for package installation to Rscript.
And finally, a bit of cleaning up!
Including [.inline-snippet]renv[.inline-snippet] as part of this project has certainly made things easier and saves us valuable time when preparing our training workshops! If you’re interested, you can read more about how we develop with [.inline-snippet]renv[.inline-snippet] for training material over in our contributing guidelines.
Are you a childhood cancer researcher that wants to learn more about the processes we use at the Data Lab to make our work robust and reproducible? Fill out this form to let us know!
On this blog, we share our expertise with the scientific community. You can expect to read technical content about our processes, information about our products and services, and much more. Subscribe here to receive updates!
At the Data Lab, our science team has a practice where an individual team member shares something that they recently figured out (or didn’t totally figure out yet) on a biweekly basis. We call this short 5-10 minute presentation How I Solved This, and it’s a great way to formally share (often hard-won) knowledge with each other. In this post, we thought we’d share how we solved something with the [.inline-snippet]renv[.inline-snippet] package with you.
The problem: managing software dependencies for training workshops
Our training workshops are among our major projects at the Data Lab. These workshops are short courses that introduce pediatric cancer researchers to things like the R programming language and quantifying RNA-seq data. A major challenge to leading these workshops, where time is of the essence, is dealing with software dependencies that need to be installed and running correctly before we get into the material. If training participants’ computing environments are not set up, it could easily take hours to get everyone ready to start. We also try to prioritize reproducibility, and therefore consistent computing environments, in most of our work.
There are lots of ways to approach managing dependencies for short courses, and at least as many philosophies about why one way is the “right” approach, but I’ll detail our approach here.
A tale of two solutions for the same problem
When we pivoted to virtual workshops in early 2020, we started using RStudio Server. With our RStudio Server, we could set up all of the software required before anyone even logged into a training session! After participants log in via their browser, we’re ready for hands-on instruction in a matter of minutes.
But, sometimes, for whatever reason, a participant needs to work outside of our RStudio Server. And, sometimes, our workshop developers need a solution outside the server, too, like when we run GitHub actions to render the R Notebooks we use to teach. To address these cases, we maintain Docker images – based on images from the Rocker Project – that have the same set of packages and dependencies installed as the server.
And that’s when the headache emerged: we had a list of R packages that had to be installed in different environments, with different methods (a shell script for installation on the server and a Dockerfile, respectively). We want to make sure we’re using the exact same versions of packages (sometimes down to the commit hash depending on where we’re installing from!), and the environments are just different enough to make using a single shell script in both places infeasible. Keeping things consistent in two places can be more than twice as hard (and we aim to work smarter, not harder)!
How we solved it
This is where [.inline-snippet]renv[.inline-snippet] comes in! All the features and foibles of [.inline-snippet]renv[.inline-snippet] are beyond the scope of this post, but we’ll cover the basics as needed to describe our solution.
[.inline-snippet]renv[.inline-snippet], pronounced “R env” for R environment, is an R package that allows users to capture the state of the packages needed for a project in something called a lockfile. (Something we have longed for in the past!) You can see our lockfile for our training repository here. [.inline-snippet]renv[.inline-snippet] also includes a function called [.inline-snippet]renv::restore()[.inline-snippet] that allows you to use that lockfile to install packages when they are not currently installed. This is exactly the functionality we leverage to keep things consistent across our Docker image and RStudio Server.
If we want to use the lockfile and [.inline-snippet]renv::restore()[.inline-snippet] for our training Docker image – largely by following the [.inline-snippet]renv[.inline-snippet] instructions for using with Docker – we can include the following instructions in our Dockerfile:
This copies over our lockfile from our repository after installing [.inline-snippet]renv[.inline-snippet], uses the lockfile to install packages on the image, and removes the [.inline-snippet]renv[.inline-snippet] directory for some savings on the image size. [.inline-snippet]renv::consent(provided = TRUE)[.inline-snippet] lets [.inline-snippet]renv[.inline-snippet] write and update files in a non-interactive session like when you build a Docker image.
To install these packages on the server, we can basically follow similar steps in the shell script we use, but we snag the lockfile from GitHub instead.
First, we need to specify the version of R because the server can support multiple versions.
Then, we set the CRAN mirror; here we’re using RStudio Package Manager.
Just like in the Dockerfile, we have to specify the version of the [.inline-snippet]renv[.inline-snippet] package we want to use.
And we’ll want to grab the most recent version of the lockfile from GitHub.
To install the R packages, we use a heredoc to pass all the required steps for package installation to Rscript.
And finally, a bit of cleaning up!
Including [.inline-snippet]renv[.inline-snippet] as part of this project has certainly made things easier and saves us valuable time when preparing our training workshops! If you’re interested, you can read more about how we develop with [.inline-snippet]renv[.inline-snippet] for training material over in our contributing guidelines.
Are you a childhood cancer researcher that wants to learn more about the processes we use at the Data Lab to make our work robust and reproducible? Fill out this form to let us know!
On this blog, we share our expertise with the scientific community. You can expect to read technical content about our processes, information about our products and services, and much more. Subscribe here to receive updates!