1 SPEAQeasy introduction
Instructor: Leo
Congrats Nick https://t.co/O3u5XRPXy2 for your @biorxivpreprint first pre-print! 🙌🏽
— 🇲🇽 Leonardo Collado-Torres (@lcolladotor) December 12, 2020
SPEAQeasy is our @nextflowio implementation of the #RNAseq processing pipeline that produces @Bioconductor-friendly #rstats objects that we use at @LieberInstitute
đź“ś https://t.co/zKuBRtBCmY pic.twitter.com/F83fXI90eP
1.2 SPEAQeasy main links
- Paper: https://doi.org/10.1186/s12859-021-04142-3
- Documentation website: http://research.libd.org/SPEAQeasy/
- Source code: https://github.com/LieberInstitute/SPEAQeasy
- Example website: http://research.libd.org/SPEAQeasy-example/
- Source code: https://github.com/LieberInstitute/SPEAQeasy-example
- Differential expression analysis bootcamp: https://lcolladotor.github.io/bioc_team_ds/differential-expression-analysis.html
- 3 sessions, each 2 hours long
1.3 Pipeline outputs
- Documentation chapter: http://research.libd.org/SPEAQeasy/outputs.html
That’s enough links! Lets download some data to check it out. We’ll use BiocFileCache to keep the data in a local cache in case we want to run this example again and don’t want to re-download the data from the web.
## Load the container package for this type of data
library("SummarizedExperiment")
## Download and cache the file
library("BiocFileCache")
<- BiocFileCache::BiocFileCache()
bfc <- BiocFileCache::bfcrpath(
cached_rse_gene_example x = bfc,
"https://github.com/LieberInstitute/SPEAQeasy-example/raw/master/rse_speaqeasy.RData"
)
## adding rname 'https://github.com/LieberInstitute/SPEAQeasy-example/raw/master/rse_speaqeasy.RData'
## Check the local path on our cache
cached_rse_gene_example
## BFC1
## "/github/home/.cache/R/BiocFileCache/48f7acf1717_rse_speaqeasy.RData"
## Load the rse_gene object
load(cached_rse_gene_example, verbose = TRUE)
## Loading objects:
## rse_gene
## General overview of the object
rse_gene
## class: RangedSummarizedExperiment
## dim: 60609 40
## metadata(0):
## assays(1): counts
## rownames(60609): ENSG00000223972.5 ENSG00000227232.5 ... ENSG00000210195.2 ENSG00000210196.2
## rowData names(10): Length gencodeID ... NumTx gencodeTx
## colnames(40): R13896_H7JKMBBXX R13903_HCTYLBBXX ... R15120_HFY2MBBXX R15134_HFFGHBBXX
## colData names(67): SAMPLE_ID FQCbasicStats ... AgeDeath BrNum
## We can check how big the object is with lobstr
::obj_size(rse_gene) lobstr
## 35.78 MB
1.4 Exercises
Exercise 1:
Either by exploring the object rse_gene
or by checking the SPEAQeasy
documentation, what are the possible values for the variable trimmed
?
Exercise 2:
Across genes (rse_gene
), exons (rse_exon
), exon-exon junctions (rse_jx
), and transcripts (rse_tx
), what part of the output is identical?
If you want to answer this question with data, you could use the 4 RSE
objects from the BrainSEQ Phase II study that are available at http://eqtl.brainseq.org/phase2/. They were created with the scripts at https://github.com/LieberInstitute/brainseq_phase2#rse_gene_unfilteredrdata. Note that these are much larger objects since they contain information for 900 samples.
1.5 Solutions
Solution 1: From http://research.libd.org/SPEAQeasy/outputs.html#quality-metrics the answer was:
A boolean value (“TRUE” or “FALSE”), indicating whether the given sample underwent trimming
With code, it’s this:
class(rse_gene$trimmed)
## [1] "logical"
## logical vectors can take 2 values (plus the third `NA` if it's missing)
Solution 2:
From http://research.libd.org/SPEAQeasy/outputs.html#coldata-of-rse-objects the answer is that all objects have identical colData()
.