Note that biocthis is not a Bioconductor-core package and as such it is not a Bioconductor official package. It was made by and for Leonardo Collado-Torres so he could more easily maintain and create Bioconductor packages as listed at lcolladotor.github.io/pkgs/. Hopefully biocthis will be helpful for you too.
For the basics, please check the
Introduction to biocthis vignette.
In 2019, I was able to take the “Building Tidy Tools” workshop taught by Charlotte and Hadley Wickham during
rstudio::conf(2019) thanks to a diversity scholarship. During this workshop, I learned about usethis (Wickham and Bryan, 2021), devtools (Wickham, Hester, and Chang, 2021), testthat (Wickham, 2011), among other R packages, and how to use RStudio Desktop to create R packages more efficiently. I got to revise this material and practice it more for the CDSB Workshop 2019: How to Build and Create Tidy Tools where we re-used the materials (with their permission) and translated them to Spanish. Over the years I have made several Bioconductor R packages that I maintain. Yet I learned a lot thanks to Charlotte and Hadley and have been relying more and more on usethis and related packages.
Earlier this year (2020) one of my Bioconductor packages (regionReport) was presenting some errors on some operating systems but not on others. I first spent quite a bit of time setting up the corresponding R installation in my non-work Windows computer. I still struggled to reproduce the error, so I finally learned how to use the Bioconductor docker images. That is, run the following code to then have an environment with all the system dependencies installed for Bioconductor packages. In this system you can then install your package dependencies and get very close to the Linux environment machine used for testing Bioconductor packages.
docker run \ -e PASSWORD=bioc \ -p 8787:8787 \ bioconductor/bioconductor_docker:devel
Using this docker image, I was finally able to reproduce the error which involved others Bioconductor packages. However, there was a second hard-to-reproduce error. Using GitHub Actions, which I’ll talk about more soon, I was then able to find the root cause of this second issue and resolve it.
biocthis (Collado-Torres, 2021) was born from my interest to keep using usethis and related tools, but in a Bioconductor-friendly way. That is, this is a package that will help me (and maybe others too). This package was born from these 5 issues:
BiocCheck is run on all new Bioconductor package submissions and by default it checks whether the new package adheres to the Bioconductor coding style guide. For a long time, it has suggested formatR as a solution for automatically styling code in an R package. While formatR mostly works and I’ve used it before, I recently discovered styler which can be used for styling code to fit the tidyverse coding style guide. On my own packages, I have found styler to be superior to formatR because it:
Several of the issues I made are related to using styler to automatically re-format your code to match more closely the Bioconductor coding style guide. That is how
bioc_style() was born and it was the suggested approach as discussed at Bioc-friendly style feature suggestion r-lib/styler#636. The maintainer of styler, Lorenz Walthert, has a great reply on that issue linking for a more detailed discussion on how to expand styler if the job requires doing so.
bioc_style() does not fully replicate the Bioconductor coding style, but it gets close enough. As Martin Morgan said at Recommend
formatR suggestion Bioconductor/BiocCheck#57, a solution that gets 90% of the way is good enough.
bioc_style() is a very short function, mostly because the Bioconductor and Tidyverse coding style guides are overall very similar. This function won’t solve all the formatting issues detected by BiocCheck, but if you really want to, you can disable the formatting checks with:
I have been using Travis CI for several years now to help me run
R CMD check every time I make a commit and push it to
Travis CI has mostly worked well for me, though I frequently had to maneuver around the 50 minute limit. I also recently ran into a problem where Hadley Wickham replied “We now recommend using the github actions workflow instead; which avoids all this configuration pain”. I also ran into a problem that didn’t always happen in Travis CI but that was potentially related to the computational resources provided (memory). I heard the term
GitHub Actions at
rstudio::conf(2020) but I ended up missing Jim Hester’s talk which you can watch online: I highly recommend it and wish I had started my adventure into GitHub Actions with it. Briefly, GitHub Actions allows you to run checks on Windows, macOS or Linux for up to 6 hours on machines with 7 GB of RAM. That’s two more operating systems than what I was using with
Travis CI, a significant amount longer of time, and a decent chunk of memory.
The significance of these 3 operating systems is important to me because Bioconductor runs nightly checks on those 3 platforms. It’s a great way to know if your Bioconductor R package will work for most users. However, you only get one report per day. If you are not the most organized person like me, and have to fix your code before a release, then you don’t have as many days to check your R package(s) and need more frequent feedback. So I’ve been looking for a way to run checks on all three platforms on demand. Bioconductor has a Single Package Builder which does this, but it is restricted to new package submissions.
I know that there’s AppVeyor for running checks on Windows, but I never used it. Travis CI does support macOS and Linux. In the past, I have used rhub and I was able to run tests on a package using a combination of
Travis CI and
rhub as detailed at r-hub/rhub/issues#52.
rhub maintainers have also taken steps to support Bioconductor’s release cycle as described at r-hub/rhub/issues#38. Regardless of the platform, it would ultimately be nice to have a single configuration file that you (the package developer) don’t need to update for every Bioconductor release cycle.
I saw on Twitter the announcement about GitHub Actions in usethis and that is when I started to look more into usethis and actions by Jim Hester, particularly r-lib/actions/examples. As my usual, I tried to just get it to work and then had to look more closely at the documentation and the code. Naively, I thought that I could make r-lib/actions/examples/check-standard.yaml Bioconductor-friendly, which Jim Hester immediately recognized as a complicated task. As you can see at Bioconductor-friendly R CMD check action feature suggestion r-lib/actions#84 this took a while. When working on this, I also looked at several other resources and real world examples:
rOpenScibook on GitHub Actions: https://ropenscilabs.github.io/actions_sandbox/
Most of the development of the Bioconductor-friendly GitHub Actions workflow provided by biocthis was done with leekgroup/derfinderPlot/.github/workflows/check-bioc.yml and LieberInstitute/recount3/.github/workflows/check-bioc.yml as detailed at: Bioconductor-friendly R CMD check action feature suggestion r-lib/actions#84. It was then further improved by a pull request with tests carried out at lcolladotor/testmatrix.
This work eventually lead to
use_bioc_github_action() as it is today. The features of this GHA workflow are described in the
Introduction to biocthis vignette. Going back to the story about developing this GHA workflow, while working on this GHA workflow, I ran into several issues and I wouldn’t be surprised if we run into more of them later on.
devel, so just before R 4.0.0 was released it was called
releasepointed to 3.6.3 and
develto 4.1.0. At r-lib/actions/pull/68 the decision was made that this was a transient issue. In the meantime I wanted to get the GHA to work, this lead me to many issues about installing package dependencies on R 4.1.0 to test Bioconductor 3.11, which is NOT the thing you should do! Bioconductor 3.11 is meant to run on R 4.0.x, not 4.1.x. Hervé Pagès helped me with some of these tricks, particularly with Windows. Also on Windows, I learned more about
r-lib/actionsfrom the update to support Rtools 4.0 on Windows by Jeroen Ooms, which Constantin AE and I were discussing on the RStudio Community website. On macOS I ran into compiling XML from source and its system dependencies. I also ran into xml2/issues/296 which is now officially resolved thanks to xml2/issues/302, though looking at r-lib/usethis/.github/workflows/R-CMD-check.yaml and r-lib/usethis/commits/.github/workflows/R-CMD-check.yaml was very helpful.
R CMD BiocCheckon both Windows and the Bioconductor docker image (different issues) that I can avoid using code like this:
Rscript -e "BiocCheck::BiocCheck()".
gitto then run pkgdown, which involved the
GITHUB_TOKENenvironment variable on the Bioconductor docker step and using
git config --locala couple of times. You will also need to run
pkgdown::deploy_to_branch()once locally to set up the
gh-pagesbranch properly for pkgdown to work from GitHub Actions.
/nocacheon your commit message.
gitfrom source by modifying these instructions in order to have a
gitversion equal or newer to 2.18 on Ubuntu 18.04 such that I can then use
actions/checkout@v2and avoid issues with running pkgdown. I later learned how to use a ppa for Ubuntu for installing the latest
gitversion. If you use
actions/checkout@v1you can end up at r-lib/actions/issues/50. If you use the default
git2.17.1 on Ubuntu 18.04 then you run into actions/checkout/issues/238 and other related issues. I could have avoided this by running pkgdown on macOS instead of the Bioconductor docker image that is based on
rockerdev/rstudio:R.4.0.0_ubuntu18.04. Nowadays, the bioconductor docker devel images are based on Ubuntu 20.04, known as
focal. The RStudio Package Manager (RSPM) greatly improves the speed at which R packages are installed in Linux and thus on the Bioconductor docker images.
.Renvironas the one used in the Bioconductor machines by downloading files like 3.11/Renviron.bioc and locating them correctly.
The resulting Bioconductor-friendly GitHub Actions workflow that you can add to your package with
biocthis::use_bioc_github_action() has many comments which you might find helpful for understanding why some steps are done the way they are. I have tried to simplify the workflow when possible, but it depends on the latest version of many tools and thus will expose you to issues you might have not dealt with, particularly compilation issues of R packages with R-devel (six months of the year with the current Bioconductor release cycle). If you need help, start by going through the steps listed at r-lib/actions#where-to-find-help. biocthis exclusive issues are always welcome, though please include the information that will enable others to help you faster. Thank you!
biocthis also provides other usethis-like functions. To make these functions, I looked at the code inside usethis and learned how to make templates, how the data is passed to the templates and some other steps. Some of the functions are really identical to the ones from usethis but point to a custom template provided by biocthis. These functions have simplified for me the task of having uniform README.Rmd/md and vignette files for instance, as well as having GitHub issue & support templates that include some Bioconductor-specific information and some of my own personal preferences for asking for help. I also included template R scripts through
use_bioc_pkg_templates() that is an idea I first learned at
rstudio::conf(2020) on the golem package. Those scripts are useful to keep track of code that you had to run to make the R package or to update it later. These scripts can greatly jump-start your R/Bioconductor package creation process. So maybe you’ll see more packages by me and others soon =) In particular, I really hope that we can get more CDSB members to submit R/Bioconductor packages to the world as explained in this story, which is something I care about quite a bit.
I just want to thank everyone for helping me understand different pieces of code, for producing the tools I used, for interacting with me across many GitHub issues, as well as answering questions on multiple mailing lists. The names below are in order they appear in this vignette:
as well as several organizations and members:
Thank you very much! 🙌🏽😊
The biocthis package (Collado-Torres, 2021) was made possible thanks to:
This package was developed using biocthis.
Code for creating the vignette
## Create the vignette library("rmarkdown") system.time(render("biocthis_dev_notes.Rmd", "BiocStyle::html_document")) ## Extract the R code library("knitr") knit("biocthis_dev_notes.Rmd", tangle = TRUE)
Date the vignette was generated.
#>  "2021-08-10 05:59:45 UTC"
Wallclock time spent generating the vignette.
#> Time difference of 1.022 secs
R session information.
#> ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.0 (2021-05-18) #> os Ubuntu 20.04.2 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz UTC #> date 2021-08-10 #> #> ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── #> package * version date lib source #> BiocManager 1.30.16 2021-06-15  CRAN (R 4.1.0) #> BiocStyle * 2.21.3 2021-06-16  Bioconductor #> bookdown 0.22 2021-04-22  RSPM (R 4.1.0) #> bslib 0.2.5.1 2021-05-18  RSPM (R 4.1.0) #> cachem 1.0.5 2021-05-15  RSPM (R 4.1.0) #> cli 3.0.1 2021-07-17  RSPM (R 4.1.0) #> crayon 1.4.1 2021-02-08  RSPM (R 4.1.0) #> desc 1.3.0 2021-03-05  RSPM (R 4.1.0) #> digest 0.6.27 2020-10-24  RSPM (R 4.1.0) #> evaluate 0.14 2019-05-28  RSPM (R 4.1.0) #> fastmap 1.1.0 2021-01-25  RSPM (R 4.1.0) #> fs 1.5.0 2020-07-31  RSPM (R 4.1.0) #> generics 0.1.0 2020-10-31  RSPM (R 4.1.0) #> htmltools 0.5.1.1 2021-01-22  RSPM (R 4.1.0) #> httr 1.4.2 2020-07-20  RSPM (R 4.1.0) #> jquerylib 0.1.4 2021-04-26  RSPM (R 4.1.0) #> jsonlite 1.7.2 2020-12-09  RSPM (R 4.1.0) #> knitr 1.33 2021-04-24  RSPM (R 4.1.0) #> lubridate 1.7.10 2021-02-26  RSPM (R 4.1.0) #> magrittr 2.0.1 2020-11-17  RSPM (R 4.1.0) #> memoise 2.0.0 2021-01-26  RSPM (R 4.1.0) #> pkgdown 22.214.171.12401 2021-08-05  Github (r-lib/pkgdown@ce9781a) #> plyr 1.8.6 2020-03-03  RSPM (R 4.1.0) #> R6 2.5.0 2020-10-28  RSPM (R 4.1.0) #> ragg 1.1.3 2021-06-09  RSPM (R 4.1.0) #> Rcpp 1.0.7 2021-07-07  RSPM (R 4.1.0) #> RefManageR * 1.3.0 2020-11-13  RSPM (R 4.1.0) #> rlang 0.4.11 2021-04-30  RSPM (R 4.1.0) #> rmarkdown 2.10 2021-08-06  RSPM (R 4.1.0) #> rprojroot 2.0.2 2020-11-15  RSPM (R 4.1.0) #> sass 0.4.0 2021-05-12  RSPM (R 4.1.0) #> sessioninfo * 1.1.1 2018-11-05  RSPM (R 4.1.0) #> stringi 1.7.3 2021-07-16  RSPM (R 4.1.0) #> stringr 1.4.0 2019-02-10  RSPM (R 4.1.0) #> systemfonts 1.0.2 2021-05-11  RSPM (R 4.1.0) #> textshaping 0.3.5 2021-06-09  RSPM (R 4.1.0) #> withr 2.4.2 2021-04-18  RSPM (R 4.1.0) #> xfun 0.25 2021-08-06  RSPM (R 4.1.0) #> xml2 1.3.2 2020-04-23  RSPM (R 4.1.0) #> yaml 2.2.1 2020-02-01  RSPM (R 4.1.0) #> #>  /__w/_temp/Library #>  /usr/local/lib/R/site-library #>  /usr/local/lib/R/library
Citations made with RefManageR (McLean, 2017).
 L. Collado-Torres. Automate package and project setup for Bioconductor packages. https://github.com/lcolladotor/biocthisbiocthis - R package version 1.3.8. 2021. DOI: 10.18129/B9.bioc.biocthis. URL: http://www.bioconductor.org/packages/biocthis.
 H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.