RNA-seq samples beyond the known transcriptome with derfinder available via recount2

Abstract

Background. Differential expression analysis of RNA sequencing (RNA-seq) data typically relies on reconstructing transcripts or counting reads that overlap known gene structures. We previously introduced an intermediate statistical approach called differentially expressed region (DER) finder that seeks to identify contiguous regions of the genome showing differential expression signal at single base resolution without relying on existing annotation or potentially inaccurate transcript assembly. The first step is to align the RNA-seq reads to the genome which is costly and requires a solid computing infrastructure. Methods. We implemented the DER finder approach in a R software package called derfinder which provides a computationally efficient bump-hunting approach to identify DERs that permits genome-scale analyses in a large number of samples. Using the Rail-RNA aligner we aligned over 70,000 human RNA-seq samples and summarized the results at the gene, exon, exon-exon junction and base pair-level coverage levels. Results. We used derfinder to identify over 50,000 regions of the genome that are differentially expressed throughout the lifespan of the human brain with strong signal in non-exonic sections of the genome. We also developed the recount R software package that provides access to over 70,000 RNA-seq samples grouped in 2,041 projects. Conclusions. The recount resource can be used for different levels of RNA-seq differential expression analysis without having the costly computational infrastructure for the alignment step. derfinder can be used with recount data or your own private data to perform annotation-agnostic RNA-seq analyses.

Date
Location
San Diego, CA, US