Where do I start using Bioconductor?

I was recently asked where do I get started with Bioconductor? and thought this would be a good short post.

What is BioC?

Briefly, Bioconductor (Gentleman, Carey, Bates, and others, 2004) is an open source project that hosts a wide range of tools for analyzing biological data with R (R Core Team, 2014). These analysis tools are bundled into packages which are designed to answer specific questions or to provide key infrastructure. If this sounds like something you are interested in, visit bioconductor.org.

Obviously, you need to know the basics about R in order to use Bioconductor.

BioconductorLogo

Getting started

bioconductor.org has a section in it’s front page titled get started with Bioconductor. There you will find links that explain how to install it or to explore the available packages.

You have a use case

If you have a particular use case in mind, I recommend browsing the software packages and searching for some key words. For example, you might be interested in high throughput sequencing of RNAs and if you search RNAseq or RNA-seq you can find a good set of packages to start. Alternatively, use the biocViews tree menu to explore specific categories of packages.

Once you find a set of packages that have descriptions that appeal to you, explore their vignettes. These are either PDF or HTML documents that explain what the package does to new users. They also exemplify how to tie together the different functions in the package, which is a key piece of information. For example, in the RNA-seq example you will find the DEXSeq package. DEXseq (Anders, Reyes, and Huber, 2012) has a vignette called Analyzing RNA-seq data for differential exon usage with the “DEXSeq” package and from the page of the package you can access the PDF vignette.

Then it’s just a matter of exploring other packages, checking the vignettes and learning as you go.

You don’t have a use case

If you don’t have a specific use case in mind, it might pay off to start by exploring the Bioconductor workflows. These documents explain how to use different packages to accomplish specific type of analyses. They are great to learn what you can do with Bioconductor!

Another option is to look at the previous courses. For example, under the 2008 courses you’ll find to the course R/Bioconductor Curso Intensivo (Spanish) which I taught back in the day. As much as I would like to self promote myself, the best starting point is the most recent BioC20XX course: BioC2014. It has slides showcasing some of the newest packages and tutorials on how to use them.

An alternative is to look at some of the Bioconductor publications which includes books about Bioconductor and research papers describing some of the packages.

Once you find a set of packages that catch your eye, go look at their vignettes just like I explained in the you have a use case scenario.

Help tips

It’s not a matter of whether you will need help learning how to use Bioconductor. It’s just a matter of when. So don’t feel bad about having to ask for help!!

The very first place to start is to look at bioconductor.org at the Help section in the bottom. For example, you can find youtube videos contributed under the community section. There you can also find links to other blog posts explaining how to use Bioconductor. Take a peak at the other sections under Help before using the Bioconductor support site: it’s where you can ask very specific questions and interact with the maintainers of the packages you are using.

Finally, if you are interested in new developments, then check the latest newsletter, for example the October 2014 one.

Good luck using Bioconductor!

References

Citations made with knitcitations (Boettiger, 2014).

[1] S. Anders, A. Reyes and W. Huber. “Detecting differential usage of exons from RNA-seq data.” In: Genome Research 22 (2012), p. 4025. DOI: 10.1101/gr.133744.111.

[2] C. Boettiger. knitcitations: Citations for knitr markdown files. R package version 1.0.2. 2014. URL: https://github.com/cboettig/knitcitations.

[3] R. C. Gentleman, V. J. Carey, D. M. Bates and others. “Bioconductor: Open software development for computational biology and bioinformatics”. In: Genome Biology 5 (2004), p. R80. URL: http://genomebiology.com/2004/5/10/R80.

[4] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2014. URL: http://www.R-project.org/.

Want more?

Check other @jhubiostat student blogs at Bmore Biostats as well as topics on #rstats.

Leonardo Collado-Torres
Leonardo Collado-Torres
Investigator @ LIBD, Assistant Professor, Department of Biostatistics @ JHBSPH

#rstats @Bioconductor/🧠 genomics @LieberInstitute/@lcgunam @jhubiostat @jtleek @andrewejaffe alumni/@LIBDrstats @CDSBMexico co-founder

comments powered by Disqus