RNA-seq transcript quantification from reduced-representation data in recount2

Fu J, Kammers K, Nellore A, Leonardo Collado-Torres, Leek JT, Taub MA

January, 2018

Image credit: bioRxiv

Abstract

More than 70,000 short-read RNA-sequencing samples are publicly available through the recount2 project, a curated database of summary coverage data. However, no current methods can estimate transcript-level abundances using the reduced-representation information stored in this database. Here we present a linear model utilizing coverage of junctions and subdivided exons to generate transcript abundance estimates of comparable accuracy to those obtained from methods requiring read-level data. Our approach flexibly models bias, produces standard errors, and is easy to refresh given updated annotation. We illustrate our method on simulated and real data and release transcript abundance estimates for the samples in recount2.

Type

Preprint

Publication

bioRxiv 247346