Reproducible RNA-seq analysis with recount2

Abstract

In the past decade high-throughput RNA sequencing (RNA-seq) has become the dominant method for studying gene expression. Commonly, researchers generate RNA-seq data to perform a differential expression analysis by either reconstructing transcripts or counting reads that overlap known gene structures. Previously we introduced an intermediate statistical approach called differential expressed region (DER) finder that identifies contiguous regions of the genome showing differential expression signal at single base resolution in an annotation-agnostic manner (Collado-Torres et al, 2017, NAR). These three different approaches all require aligning the RNA-seq data against the reference genome. While over 2,000 projects are publicly available via the Sequence Read Archive (SRA) it is computationally intensive to re-analyze the RNA-seq data in a homogenous way. Using Rail-RNA (Nellore et al, 2016, Bioinformatics) we aligned over 70,000 human RNA-seq samples and created summaries at the gene, exon, exon-exon junction and base pair coverage levels which are available via the recount2 resource at https://jhubiostatistics.shinyapps.io/recount/ and the recount Bioconductor package (Collado-Torres, Nellore et al, 2017, Nature Biotechnology). The data from the recount2 resource can be used for different levels of RNA-seq differential expression analysis while bypassing the costly computational step of alignment. The data from recount2 can be used to predict phenotype information that is otherwise impractical to obtain for all SRA human projects. In recount2 we also provide data for the Genotype-Expression Tissue project (GTEx) and The Cancer Genome Atlas (TCGA) which have richer metadata and can be used for developing new methods and testing them before applying them to the SRA data. We expect that recount2 will be very useful for the genomics community and in particular for the statistical community by fostering the development of new methods, just like ReCount (Frazee et al, 2011, Bioinformatics) played a roll in the development of DESeq2, voom and metagenomeSeq among other methods.

Date
Location
Chicago, IL, US