4 R/Bioconductor Data Science bootcamps
In order to have a common set of external references and R knowledge that we use for the Data Science guidance sessions as well as our work, we have a series of R and Bioconductor bootcamps. Most of the material these bootcamps are based on is freely available or can be purchased for a small fee 7. Thus, we can feel confident that our LIBD and JHU collaborators will be able to access the material, as well as anyone else in the world.
You can also find the videos of all our bootcamps on YouTube at lcolladotor/playlists and described on the following Twitter thread.
It's been ~4 weeks. We spent the first 3 learning more about #rstats #bioinformatics & @Bioconductor. If you are interested in our bootcamp sessions, you can find the videos at https://t.co/uBVqZMfgkP
— 🇲🇽 Leonardo Collado-Torres (@lcolladotor) October 16, 2020
and in general athttps://t.co/JjZMQZsY1S@LieberInstitute @LIBDrstats ✌️🏽 https://t.co/uujaBguHMi
Here are the videos from YouTube.
4.1 Overview videos
Here are some highlighted overview videos that you might find useful as you get started.
4.1.1 R and RStudio
If you are new to R and RStudio, you might like this overview video and companion notes.
4.1.2 Bioconductor
Similarly, this overview video and companion notes on Bioconductor might be useful too.
4.1.3 ggplot2 R graphics
This video and companion notes on ggplot2
R graphics could also be useful.
4.1.4 Submitting jobs at JHPCE
This video on how to use sgejobs
for submitting jobs at JHPCE could be useful too.
For a more advanced video on setting up bash loops and nested qsub
s, check this second sgejobs
video.
If you need to use the GPU queue at JHPCE, then check this video and companion slides.
4.2 LIBD bootcamps
We have organized accessible bootcamps that have biologists in mind. If you spend most of your time coding, check the Team bootcamps further below. For all of these bootcamps, you should have the latest R and RStudio Desktop installed. You will typically need a computer with at least 8 GB of RAM.
Session | Time | Prerequisites | Topic |
---|---|---|---|
1 | 2020-10-05 3-5 pm | R + RStudio | Differential expression analysis (LIBD-style) |
2 | 2020-10-06 3-5 pm | R + RStudio | Differential expression analysis (LIBD-style) |
3 | 2020-10-07 1-3 pm | R + RStudio | Differential expression analysis (LIBD-style) |
4.3 Team bootcamps
The team bootcamps are really for our team members and other more advanced R/programming members and collaborators. However, the material we cover is within reach of everyone as long as you practice using R/Bioconductor here and there. The concepts covered are of use to all of us who work with R, but we understand that you might not have as much time to learn these materials. If that’s the case, please feel free to sign up for our Data Science guidance sessions and we’ll help you learn these concepts at your own pace.
The first iteration of these bootcamps were run on September 2020 with the following schedule. For all of them, you should have the latest R and RStudio versions installed in your computer and be familiar with the R programming language. For a more structured working environment, we might use JHPCE’s computational resources while running RStudio on our computers and running code through a Linux terminal 8. You will probably need to spend time self-learning and practicing some of the material beyond these videos. If you just started learning about R, then these bootcamps will be quite challenging.
Session | Time | Prerequisites | Topic |
---|---|---|---|
1 | 2020-09-21 3-5 pm | NA | How to be a modern scientist |
2 | 2020-09-22 3-5 pm | R + RStudio | What they forgot to teach you about R |
3 | 2020-09-23 1-3 pm | R + RStudio | What they forgot to teach you about R |
4 | 2020-09-24 3-5 pm | R + RStudio | The Elements of Data Analytic Style + CBDS |
5 | 2020-09-28 3-5 pm | R + RStudio | The Elements of Data Analytic Style + CBDS |
6 | 2020-09-29 3-5 pm | Be a part of the DSgs-guides team | DSgs-guide training |
7 | 2020-09-30 1-3 pm | RStudio + R functions | Building Tidy Tools |
8 | 2020-10-01 3-5 pm | RStudio + R functions | Building Tidy Tools |
4.4 Bootcamp source materials
There are tons of resources that are useful for learning about R, Bioconductor, and data science in general. We have selected some of those resources because:
- we are familiar with them
- they are freely available or can be purchased for a small fee
- we have heard good things about them
4.4.1 Main materials
There are more materials out there than those that we’ve had a chance to learn about, and with time, this list will change. Here is our latest list.
- Introductory level
- course Cloud Based Data Science by Jeff Leek, Aboozar Hadavand, Shannon Ellis, Leslie Myint, Sarah McClymont, Leah Jager, and others.
- Advanced
- book R for Data Science by Garrett Grolemund and Hadley Wickham. Useful for learning about the tidyverse, which is supported by RStudio (now Posit).
- workshop What they forgot to teach you about R 2020 version by Kara Woo, Jenny Bryan, and Jim Hester.
- workshop Building Tidy Tools 2020 version by Charlotte Wickham and Hadley Wickham.
- General
- book The Elements of Data Analytic Style by Jeff Leek.
- book How to be a modern scientist by Jeff Leek.
- book The Art of Data Science by Roger D. Peng and Elizabeth Matsui.
- Statistics
- Bioconductor
- book Orchestrating Single-Cell Analysis with Bioconductor by Aaron Lun, Robert Amezquita, Stephanie Hicks, and Raphael Gottardo.
- workshop WEHI scRNA-seq course (2020) by Peter Hickey based on the OSCA book.
- workflows Bioconductor Common Workflows. There’s over 25 of them from many authors.
- book Orchestrating Single-Cell Analysis with Bioconductor by Aaron Lun, Robert Amezquita, Stephanie Hicks, and Raphael Gottardo.
We also highly recommend keeping an eye open for any new work from:
- Alison Presmanes Hill. Just look at her awesome projects website!
- Desirée De Leon. Note that Desirée started as an intern working with Alison, so it’s no surprise that her projects website is excellent.
- Allison Horst and in particular her stats-illustrations which are widely used and are very helpful when teaching statistical and R concepts.
4.4.2 Courses
Our colleagues at Hopkins have also created Coursera courses (MOOCs) which you might be interested in, though they might involve other fees. These are:
- Genomic Data Science Specialization by Steven Salzberg, Jeff Leek, James Taylor (1979-2020), Mihaela Pertea, Ben Langmead, Jacob Pritt, Liliana Florea, Kasper Daniel Hansen. Actually, LIBD is an “industry partner” for this specialization.
- Data Science Specialization by Jeff Leek, Roger D. Peng, and Brian Caffo.
- Data Science: Foundations using R Specialization by Jeff Leek, Roger D. Peng, and Brian Caffo.
- Executive Data Science Specialization by Jeff Leek, Roger D. Peng, and Brian Caffo.
- The Unix Workbench by Sean Kross, Jeff Leek, Roger D. Peng, and Brian Caffo.
Rafael Irizarry and his lab have also generated free teaching materials. A former postdoc with Rafael, Michael I Love also has teaching materials. Between them they have courses on:
- Introduction to Data Science (at least 8 college-level courses)
- Data Analysis for the Life Sciences (at least 4 graduate level courses)
- Genomics Data Analysis (at least 3 graduate level courses)
They have lots of YouTube links organized on their old harvardx website, whose layout is based on Kasper Daniel Hansen’s Bioconductor for Genomic Data Science course.
A lot of the resources listed are available through Leanpub which is a publishing platform for books and courses. Authors get to set a recommended price, but also a minimum price that can be as low as $0, thus making their materials free to use.↩︎
Check our LIBD rstats club videos on how to configure your macOS or Windows computer to work with JHPCE.↩︎