vignettes/biocthis_dev_notes.Rmd
biocthis_dev_notes.Rmd
Note that biocthis is not a Bioconductor-core package and as such it is not a Bioconductor official package. It was made by and for Leonardo Collado-Torres so he could more easily maintain and create Bioconductor packages as listed at lcolladotor.github.io/pkgs/. Hopefully biocthis will be helpful for you too.
For the basics, please check the Introduction to biocthis
vignette.
biocthis
developer notes
In 2019, I was able to take the “Building Tidy Tools” workshop
taught by Charlotte and Hadley Wickham during rstudio::conf(2019)
thanks to a diversity scholarship. During this workshop, I learned about
usethis
(Wickham, Bryan, Barrett, and Teucher, 2024), devtools
(Wickham, Hester, Chang, and Bryan, 2022), testthat
(Wickham, 2011), among other R packages, and how to use RStudio
Desktop to create R packages more efficiently. I got to revise this
material and practice it more for the CDSB
Workshop 2019: How to Build and Create Tidy Tools where we re-used
the materials (with their permission) and translated them to Spanish.
Over the years I have made several Bioconductor R packages that I maintain. Yet I learned
a lot thanks to Charlotte and Hadley and have been relying more and more
on usethis and
related packages.
Earlier this year (2020) one of my Bioconductor packages (regionReport) was presenting some errors on some operating systems but not on others. I first spent quite a bit of time setting up the corresponding R installation in my non-work Windows computer. I still struggled to reproduce the error, so I finally learned how to use the Bioconductor docker images. That is, run the following code to then have an environment with all the system dependencies installed for Bioconductor packages. In this system you can then install your package dependencies and get very close to the Linux environment machine used for testing Bioconductor packages.
Using this docker image, I was finally able to reproduce the error which involved others Bioconductor packages. However, there was a second hard-to-reproduce error. Using GitHub Actions, which I’ll talk about more soon, I was then able to find the root cause of this second issue and resolve it.
biocthis (Collado-Torres, 2024) was born from my interest to keep using usethis and related tools, but in a Bioconductor-friendly way. That is, this is a package that will help me (and maybe others too). This package was born from these 5 issues:
styler
over formatR
suggestion Bioconductor/BiocCheck#57
BiocCheck is run on all new Bioconductor package submissions and by default it checks whether the new package adheres to the Bioconductor coding style guide. For a long time, it has suggested formatR as a solution for automatically styling code in an R package. While formatR mostly works and I’ve used it before, I recently discovered styler which can be used for styling code to fit the tidyverse coding style guide. On my own packages, I have found styler to be superior to formatR because it:
Several of the issues I made are related to using styler to
automatically re-format your code to match more closely the Bioconductor
coding style guide. That is how bioc_style()
was born and
it was the suggested approach as discussed at Bioc-friendly style
feature suggestion r-lib/styler#636.
The maintainer of styler, Lorenz Walthert, has a
great reply on that issue linking for a more detailed discussion on how
to expand styler if the
job requires doing so.
Currently, bioc_style()
does not fully replicate the
Bioconductor coding style, but it gets close enough. As Martin Morgan said at
Recommend styler
over formatR
suggestion Bioconductor/BiocCheck#57,
a solution that gets 90% of the way is good enough.
bioc_style()
is a very short function, mostly because the
Bioconductor and Tidyverse coding style guides are overall very similar.
This function won’t solve all the formatting issues detected by BiocCheck,
but if you really want to, you can disable the formatting checks
with:
## Use the following for the latest options
BiocCheck::usage()
## Disable formatting checks
BiocCheck::BiocCheck(`--no-check-formatting` = TRUE)
I have been using Travis CI for
several years now to help me run R CMD check
every time I
make a commit and push it to GitHub
. Travis CI
has mostly worked well for me, though I frequently had to maneuver
around the 50 minute limit. I also recently ran into a
problem where Hadley Wickham replied “We now recommend using the
github actions workflow instead; which avoids all this configuration
pain”. I also ran
into a problem that didn’t always happen in Travis CI but that was
potentially related to the computational resources provided (memory). I
heard the term GitHub Actions
at
rstudio::conf(2020)
but I ended up missing Jim Hester’s talk which you
can watch
online: I highly recommend it and wish I had started my adventure
into GitHub Actions with it. Briefly, GitHub Actions allows you
to run checks on Windows, macOS or Linux for up to 6 hours on machines
with 7 GB of RAM. That’s two more operating systems than what I was
using with Travis CI
, a significant amount longer of time,
and a decent chunk of memory.
The significance of these 3 operating systems is important to me because Bioconductor runs nightly checks on those 3 platforms. It’s a great way to know if your Bioconductor R package will work for most users. However, you only get one report per day. If you are not the most organized person like me, and have to fix your code before a release, then you don’t have as many days to check your R package(s) and need more frequent feedback. So I’ve been looking for a way to run checks on all three platforms on demand. Bioconductor has a Single Package Builder which does this, but it is restricted to new package submissions.
I know that there’s AppVeyor
for running checks on Windows, but I never used it. Travis CI does
support macOS and Linux. In the past, I have used rhub and I was able to run
tests on a package using a combination of Travis CI
and
rhub
as detailed at r-hub/rhub/issues#52.
rhub
maintainers have also taken steps to support
Bioconductor’s release cycle as described at r-hub/rhub/issues#38.
Regardless of the platform, it would ultimately be nice to have a single
configuration file that you (the package developer) don’t need to update
for every Bioconductor release cycle.
I saw on Twitter the announcement about GitHub Actions in usethis and that is when I started to look more into usethis and actions by Jim Hester, particularly r-lib/actions/examples. As my usual, I tried to just get it to work and then had to look more closely at the documentation and the code. Naively, I thought that I could make r-lib/actions/examples/check-standard.yaml Bioconductor-friendly, which Jim Hester immediately recognized as a complicated task. As you can see at Bioconductor-friendly R CMD check action feature suggestion r-lib/actions#84 this took a while. When working on this, I also looked at several other resources and real world examples:
rOpenSci
book on GitHub Actions: https://ropenscilabs.github.io/actions_sandbox/
Most of the development of the Bioconductor-friendly GitHub Actions workflow provided by biocthis was done with leekgroup/derfinderPlot/.github/workflows/check-bioc.yml and LieberInstitute/recount3/.github/workflows/check-bioc.yml as detailed at: Bioconductor-friendly R CMD check action feature suggestion r-lib/actions#84. It was then further improved by a pull request with tests carried out at lcolladotor/testmatrix.
This work eventually lead to use_bioc_github_action()
as
it is today. The features of this GHA workflow are described in the
Introduction to biocthis
vignette. Going back to the story
about developing this GHA workflow, while working on this GHA workflow,
I ran into several issues and I wouldn’t be surprised if we run into
more of them later on.
release
and
devel
, so just before R 4.0.0 was released it was called
alpha
, while release
pointed to 3.6.3 and
devel
to 4.1.0. At r-lib/actions/pull/68
the decision was made that this was a transient issue. In the meantime I
wanted to get the GHA to work, this lead me to many issues about
installing package dependencies on R 4.1.0 to test Bioconductor 3.11,
which is NOT the thing you should do! Bioconductor 3.11
is meant to run on R 4.0.x, not 4.1.x. Hervé Pagès helped me with some of
these tricks, particularly with Windows. Also on Windows, I learned more
about r-lib/actions
from the update to support Rtools 4.0 on
Windows by Jeroen Ooms,
which Constantin
AE and I were discussing on the RStudio
Community website. On macOS I ran into compiling XML from source
and its system dependencies. I also ran into xml2/issues/296
which is now officially resolved thanks to xml2/issues/302,
though looking at r-lib/usethis/.github/workflows/R-CMD-check.yaml
and r-lib/usethis/commits/.github/workflows/R-CMD-check.yaml
was very helpful.R CMD BiocCheck
on both Windows and the
Bioconductor docker image (different issues) that I can avoid using code
like this: Rscript -e "BiocCheck::BiocCheck()"
.git
to then run
pkgdown,
which involved the GITHUB_TOKEN
environment variable on the
Bioconductor docker step and using git config --local
a
couple of times. You will also need to run
pkgdown::deploy_to_branch()
once locally to set up the
gh-pages
branch properly for pkgdown to
work from GitHub Actions./nocache
on your
commit message.git
from source by
modifying these
instructions in order to have a git
version equal or
newer to 2.18 on Ubuntu 18.04 such that I can then use
actions/checkout@v2
and avoid issues with running pkgdown. I
later learned how to use a ppa
for Ubuntu for installing the latest git
version. If you
use actions/checkout@v1
you can end up at r-lib/actions/issues/50.
If you use the default git
2.17.1 on Ubuntu 18.04 then you
run into actions/checkout/issues/238
and other related issues. I could have avoided this by running pkgdown on
macOS instead of the Bioconductor docker image that is based on rockerdev/rstudio:R.4.0.0_ubuntu18.04
.
Nowadays, the bioconductor docker devel images are based on Ubuntu
20.04, known as focal
. The RStudio Package
Manager (RSPM) greatly improves the speed at which R packages are
installed in Linux and thus on the Bioconductor docker images. Since
August 2022, ubuntu-latest changed to Ubuntu 22.04, also known as
jammy
..Renviron
as
the one used in the Bioconductor machines by downloading files like 3.11/Renviron.bioc
and locating them correctly..Renviron
files.The resulting Bioconductor-friendly GitHub Actions workflow that you
can add to your package with
biocthis::use_bioc_github_action()
has many comments which
you might find helpful for understanding why some steps are done the way
they are. I have tried to simplify the workflow when possible, but it
depends on the latest version of many tools and thus will expose you to
issues you might have not dealt with, particularly compilation issues of
R packages with R-devel (six months of the year with the current
Bioconductor release cycle). If you need help, start by going through
the steps listed at r-lib/actions#where-to-find-help.
biocthis
exclusive issues are always welcome, though please include the
information that will enable others to help you faster. Thank you!
usethis
-like functions
biocthis
also provides other usethis-like
functions. To make these functions, I looked at the code inside usethis and
learned how to make templates, how the data is passed to the templates
and some other steps. Some of the functions are really identical to the
ones from usethis but
point to a custom template provided by biocthis.
These functions have simplified for me the task of having uniform
README.Rmd/md and vignette files for instance, as well as having GitHub
issue & support templates that include some Bioconductor-specific
information and some of my own personal preferences for asking for help.
I also included template R scripts through
use_bioc_pkg_templates()
that is an idea I first learned at
rstudio::conf(2020)
on the golem package.
Those scripts are useful to keep track of code that you had to run to
make the R package or to update it later. These scripts can greatly
jump-start your R/Bioconductor package creation process. So maybe you’ll
see more packages by me and others soon =) In particular, I really hope
that we can get more CDSB members to submit R/Bioconductor packages to
the world as explained in this
story, which is something I care about quite a bit.
I just want to thank everyone for helping me understand different pieces of code, for producing the tools I used, for interacting with me across many GitHub issues, as well as answering questions on multiple mailing lists. The names below are in order they appear in this vignette:
as well as several organizations and members:
Thank you very much! 🙌🏽😊
The biocthis package (Collado-Torres, 2024) was made possible thanks to:
This package was developed using biocthis.
Code for creating the vignette
## Create the vignette
library("rmarkdown")
system.time(render("biocthis_dev_notes.Rmd", "BiocStyle::html_document"))
## Extract the R code
library("knitr")
knit("biocthis_dev_notes.Rmd", tangle = TRUE)
Date the vignette was generated.
#> [1] "2024-12-10 21:43:03 UTC"
Wallclock time spent generating the vignette.
#> Time difference of 0.719 secs
R
session information.
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.2 (2024-10-31)
#> os Ubuntu 24.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz UTC
#> date 2024-12-10
#> pandoc 3.5 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> backports 1.5.0 2024-05-23 [1] RSPM (R 4.4.0)
#> bibtex 0.5.1 2023-01-26 [1] RSPM (R 4.4.0)
#> BiocManager 1.30.25 2024-08-28 [2] CRAN (R 4.4.2)
#> BiocStyle * 2.34.0 2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
#> bookdown 0.41 2024-10-16 [1] RSPM (R 4.4.0)
#> bslib 0.8.0 2024-07-29 [2] RSPM (R 4.4.0)
#> cachem 1.1.0 2024-05-16 [2] RSPM (R 4.4.0)
#> cli 3.6.3 2024-06-21 [2] RSPM (R 4.4.0)
#> desc 1.4.3 2023-12-10 [2] RSPM (R 4.4.0)
#> digest 0.6.37 2024-08-19 [2] RSPM (R 4.4.0)
#> evaluate 1.0.1 2024-10-10 [2] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [2] RSPM (R 4.4.0)
#> fs 1.6.5 2024-10-30 [2] RSPM (R 4.4.0)
#> generics 0.1.3 2022-07-05 [1] RSPM (R 4.4.0)
#> glue 1.8.0 2024-09-30 [2] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [2] RSPM (R 4.4.0)
#> htmlwidgets 1.6.4 2023-12-06 [2] RSPM (R 4.4.0)
#> httr 1.4.7 2023-08-15 [1] RSPM (R 4.4.0)
#> jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.4.0)
#> jsonlite 1.8.9 2024-09-20 [2] RSPM (R 4.4.0)
#> knitr 1.49 2024-11-08 [2] RSPM (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.4.0)
#> lubridate 1.9.4 2024-12-08 [1] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.4.0)
#> pkgdown 2.1.1 2024-09-17 [2] RSPM (R 4.4.0)
#> plyr 1.8.9 2023-10-02 [1] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [2] RSPM (R 4.4.0)
#> ragg 1.3.3 2024-09-11 [2] RSPM (R 4.4.0)
#> Rcpp 1.0.13-1 2024-11-02 [2] RSPM (R 4.4.0)
#> RefManageR * 1.4.0 2022-09-30 [1] RSPM (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [2] RSPM (R 4.4.0)
#> rmarkdown 2.29 2024-11-04 [2] RSPM (R 4.4.0)
#> sass 0.4.9 2024-03-15 [2] RSPM (R 4.4.0)
#> sessioninfo * 1.2.2 2021-12-06 [2] RSPM (R 4.4.0)
#> stringi 1.8.4 2024-05-06 [2] RSPM (R 4.4.0)
#> stringr 1.5.1 2023-11-14 [2] RSPM (R 4.4.0)
#> systemfonts 1.1.0 2024-05-15 [2] RSPM (R 4.4.0)
#> textshaping 0.4.1 2024-12-06 [2] RSPM (R 4.4.0)
#> timechange 0.3.0 2024-01-18 [1] RSPM (R 4.4.0)
#> xfun 0.49 2024-10-31 [2] RSPM (R 4.4.0)
#> xml2 1.3.6 2023-12-04 [2] RSPM (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [2] RSPM (R 4.4.0)
#>
#> [1] /__w/_temp/Library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/local/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
This vignette was generated using BiocStyle (Oleś, 2024) with knitr (Xie, 2024) and rmarkdown (Allaire, Xie, Dervieux et al., 2024) running behind the scenes.
Citations made with RefManageR (McLean, 2017).
[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.29. 2024. URL: https://github.com/rstudio/rmarkdown.
[2] L. Collado-Torres. Automate package and project setup for Bioconductor packages. https://github.com/lcolladotor/biocthisbiocthis - R package version 1.17.0. 2024. DOI: 10.18129/B9.bioc.biocthis. URL: http://www.bioconductor.org/packages/biocthis.
[3] L. Henry and H. Wickham. rlang: Functions for Base Types and Core R and ‘Tidyverse’ Features. R package version 1.1.4, https://github.com/r-lib/rlang. 2024. URL: https://rlang.r-lib.org.
[4] J. Hester. covr: Test Coverage for Packages. R package version 3.6.4, https://github.com/r-lib/covr. 2023. URL: https://covr.r-lib.org.
[5] J. Hester and J. Bryan. glue: Interpreted String Literals. R package version 1.8.0, https://github.com/tidyverse/glue. 2024. URL: https://glue.tidyverse.org/.
[6] J. Hester, H. Wickham, and G. Csárdi. fs: Cross-Platform File System Operations Based on ‘libuv’. R package version 1.6.5, https://github.com/r-lib/fs. 2024. URL: https://fs.r-lib.org.
[7] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.
[8] K. Müller and L. Walthert. styler: Non-Invasive Pretty Printing of R Code. R package version 1.10.3, https://styler.r-lib.org. 2024. URL: https://github.com/r-lib/styler.
[9] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.34.0. 2024. DOI: 10.18129/B9.bioc.BiocStyle. URL: https://bioconductor.org/packages/BiocStyle.
[10] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2024. URL: https://www.R-project.org/.
[11] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
[12] H. Wickham, J. Bryan, M. Barrett, et al. usethis: Automate Package and Project Setup. R package version 3.1.0, https://github.com/r-lib/usethis. 2024. URL: https://usethis.r-lib.org.
[13] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. R package version 1.2.2, https://r-lib.github.io/sessioninfo/. 2021. URL: https://github.com/r-lib/sessioninfo#readme.
[14] H. Wickham, J. Hesselberth, M. Salmon, et al. pkgdown: Make Static HTML Documentation for a Package. R package version 2.1.1, https://github.com/r-lib/pkgdown. 2024. URL: https://pkgdown.r-lib.org/.
[15] H. Wickham, J. Hester, W. Chang, et al. devtools: Tools to Make Developing R Packages Easier. R package version 2.4.5, https://github.com/r-lib/devtools. 2022. URL: https://devtools.r-lib.org/.
[16] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.49. 2024. URL: https://yihui.org/knitr/.