3 recount3 introduction

Instructor: Leo

Don’t let useful data go to waste by Franziska Denk https://doi.org/10.1038/543007a

3.1 recount projects

3.2 Using recount3

Check the original documentation here and here.

Let’s first load recount3 which will load all the required dependencies including SummarizedExperiment.

## Load recount3 R package
library("recount3")

Next we need to identify a study of interest as well as choose whether we want the data at the gene, exon, or some other feature level. Once we have identified our study of interest, we can download the files and build a SummarizedExperiment object using recount3::create_rse() as we’ll show next. create_rse() has arguments through which we can control what annotation we want to use (they are organism-dependent).

## Lets download all the available projects
human_projects <- available_projects()
#> 2024-06-11 20:00:24.568254 caching file sra.recount_project.MD.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/data_sources/sra/metadata/sra.recount_project.MD.gz'
#> 2024-06-11 20:00:26.391999 caching file gtex.recount_project.MD.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/data_sources/gtex/metadata/gtex.recount_project.MD.gz'
#> 2024-06-11 20:00:27.824044 caching file tcga.recount_project.MD.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/data_sources/tcga/metadata/tcga.recount_project.MD.gz'

## Find your project of interest. Here we'll use
## SRP009615 as an example
proj_info <- subset(
    human_projects,
    project == "SRP009615" & project_type == "data_sources"
)
## Build a RangedSummarizedExperiment (RSE) object
## with the information at the gene level
rse_gene_SRP009615 <- create_rse(proj_info)
#> 2024-06-11 20:00:30.928351 downloading and reading the metadata.
#> 2024-06-11 20:00:31.47248 caching file sra.sra.SRP009615.MD.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/data_sources/sra/metadata/15/SRP009615/sra.sra.SRP009615.MD.gz'
#> 2024-06-11 20:00:32.777902 caching file sra.recount_project.SRP009615.MD.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/data_sources/sra/metadata/15/SRP009615/sra.recount_project.SRP009615.MD.gz'
#> 2024-06-11 20:00:34.154227 caching file sra.recount_qc.SRP009615.MD.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/data_sources/sra/metadata/15/SRP009615/sra.recount_qc.SRP009615.MD.gz'
#> 2024-06-11 20:00:35.349823 caching file sra.recount_seq_qc.SRP009615.MD.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/data_sources/sra/metadata/15/SRP009615/sra.recount_seq_qc.SRP009615.MD.gz'
#> 2024-06-11 20:00:36.673858 caching file sra.recount_pred.SRP009615.MD.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/data_sources/sra/metadata/15/SRP009615/sra.recount_pred.SRP009615.MD.gz'
#> 2024-06-11 20:00:37.508853 downloading and reading the feature information.
#> 2024-06-11 20:00:37.969797 caching file human.gene_sums.G026.gtf.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/annotations/gene_sums/human.gene_sums.G026.gtf.gz'
#> 2024-06-11 20:00:39.266505 downloading and reading the counts: 12 samples across 63856 features.
#> 2024-06-11 20:00:39.852384 caching file sra.gene_sums.SRP009615.G026.gz.
#> adding rname 'http://duffel.rail.bio/recount3/human/data_sources/sra/gene_sums/15/SRP009615/sra.gene_sums.SRP009615.G026.gz'
#> 2024-06-11 20:00:41.000474 constructing the RangedSummarizedExperiment (rse) object.
## Explore the resulting object
rse_gene_SRP009615
#> class: RangedSummarizedExperiment 
#> dim: 63856 12 
#> metadata(8): time_created recount3_version ... annotation recount3_url
#> assays(1): raw_counts
#> rownames(63856): ENSG00000278704.1 ENSG00000277400.1 ... ENSG00000182484.15_PAR_Y ENSG00000227159.8_PAR_Y
#> rowData names(10): source type ... havana_gene tag
#> colnames(12): SRR387777 SRR387778 ... SRR389077 SRR389078
#> colData names(175): rail_id external_id ... recount_pred.curated.cell_line BigWigURL

## How large is it?
lobstr::obj_size(rse_gene_SRP009615)
#> 24.81 MB

We can also interactively choose our study of interest using the following code or through the recount3 study explorer.

## Explore available human projects interactively
proj_info_interactive <- interactiveDisplayBase::display(human_projects)
## Choose only 1 row in the table, then click on "send".

## Lets double check that you indeed selected only 1 row in the table
stopifnot(nrow(proj_info_interactive) == 1)
## Now we can build the RSE object
rse_gene_interactive <- create_rse(proj_info_interactive)

Now that we have the data, we can use recount3::transform_counts() or recount3::compute_read_counts() to convert the raw counts into a format expected by downstream tools. For more details, check the recountWorkflow paper.

## We'll compute read counts, which is what most downstream software
## uses.
## For other types of transformations such as RPKM and TPM, use
## transform_counts().
assay(rse_gene_SRP009615, "counts") <- compute_read_counts(rse_gene_SRP009615)
## Lets make it easier to use the information available for this study
## that was provided by the original authors of the study.
rse_gene_SRP009615 <- expand_sra_attributes(rse_gene_SRP009615)
colData(rse_gene_SRP009615)[
    ,
    grepl("^sra_attribute", colnames(colData(rse_gene_SRP009615)))
]
#> DataFrame with 12 rows and 4 columns
#>           sra_attribute.cells sra_attribute.shRNA_expression sra_attribute.source_name sra_attribute.treatment
#>                   <character>                    <character>               <character>             <character>
#> SRR387777                K562                             no                    SL2933               Puromycin
#> SRR387778                K562             yes, targeting SRF                    SL2934  Puromycin, doxycycline
#> SRR387779                K562                             no                    SL5265               Puromycin
#> SRR387780                K562              yes targeting SRF                    SL3141  Puromycin, doxycycline
#> SRR389079                K562            no shRNA expression                    SL6485               Puromycin
#> ...                       ...                            ...                       ...                     ...
#> SRR389082                K562         expressing shRNA tar..                    SL2592  Puromycin, doxycycline
#> SRR389083                K562            no shRNA expression                    SL4337               Puromycin
#> SRR389084                K562         expressing shRNA tar..                    SL4326  Puromycin, doxycycline
#> SRR389077                K562            no shRNA expression                    SL1584               Puromycin
#> SRR389078                K562         expressing shRNA tar..                    SL1583  Puromycin, doxycycline

We are now ready to use other bulk RNA-seq data analysis software tools.

3.3 Exercise

Exercise 1: Use iSEE to reproduce the following image

3.4 Community

  • Tweets from the community

From a student in the LCG-UNAM 2021 course:

Exploring the possibility of using recount3 data for an analysis (January 2022):

Others discussing meta analyses publicly on Twitter:

© 2011-2023. All thoughts and opinions here are my own. The icon was designed by Mauricio Guzmán and is inspired by Huichol culture; it represents my community building interests.

Published with Bookdown