1 SPEAQeasy introduction

Instructor: Leo

1.1 2022-04-20 overview slides

1.3 Pipeline outputs

That’s enough links! Lets download some data to check it out. We’ll use BiocFileCache to keep the data in a local cache in case we want to run this example again and don’t want to re-download the data from the web.

## Load the container package for this type of data
library("SummarizedExperiment")

## Download and cache the file
library("BiocFileCache")
bfc <- BiocFileCache::BiocFileCache()
cached_rse_gene_example <- BiocFileCache::bfcrpath(
    x = bfc,
    "https://github.com/LieberInstitute/SPEAQeasy-example/raw/master/rse_speaqeasy.RData"
)
## adding rname 'https://github.com/LieberInstitute/SPEAQeasy-example/raw/master/rse_speaqeasy.RData'
## Check the local path on our cache
cached_rse_gene_example
##                                                                  BFC1 
## "/github/home/.cache/R/BiocFileCache/48f7acf1717_rse_speaqeasy.RData"
## Load the rse_gene object
load(cached_rse_gene_example, verbose = TRUE)
## Loading objects:
##   rse_gene
## General overview of the object
rse_gene
## class: RangedSummarizedExperiment 
## dim: 60609 40 
## metadata(0):
## assays(1): counts
## rownames(60609): ENSG00000223972.5 ENSG00000227232.5 ... ENSG00000210195.2 ENSG00000210196.2
## rowData names(10): Length gencodeID ... NumTx gencodeTx
## colnames(40): R13896_H7JKMBBXX R13903_HCTYLBBXX ... R15120_HFY2MBBXX R15134_HFFGHBBXX
## colData names(67): SAMPLE_ID FQCbasicStats ... AgeDeath BrNum
## We can check how big the object is with lobstr
lobstr::obj_size(rse_gene)
## 35.78 MB

1.4 Exercises

Exercise 1: Either by exploring the object rse_gene or by checking the SPEAQeasy documentation, what are the possible values for the variable trimmed?

Exercise 2: Across genes (rse_gene), exons (rse_exon), exon-exon junctions (rse_jx), and transcripts (rse_tx), what part of the output is identical?

If you want to answer this question with data, you could use the 4 RSE objects from the BrainSEQ Phase II study that are available at http://eqtl.brainseq.org/phase2/. They were created with the scripts at https://github.com/LieberInstitute/brainseq_phase2#rse_gene_unfilteredrdata. Note that these are much larger objects since they contain information for 900 samples.

1.5 Solutions

Solution 1: From http://research.libd.org/SPEAQeasy/outputs.html#quality-metrics the answer was:

A boolean value (“TRUE” or “FALSE”), indicating whether the given sample underwent trimming

With code, it’s this:

class(rse_gene$trimmed)
## [1] "logical"
## logical vectors can take 2 values (plus the third `NA` if it's missing)

Solution 2: From http://research.libd.org/SPEAQeasy/outputs.html#coldata-of-rse-objects the answer is that all objects have identical colData().

© 2011-2023. All thoughts and opinions here are my own. The icon was designed by Mauricio Guzmán and is inspired by Huichol culture; it represents my community building interests.

Published with Bookdown