Diving together into the unknown world of spatial transcriptomics

Yesterday was an extremely exciting day for me and my colleagues. We finished a project we had been working on and shared it with the world. Meaning, it’s done and we can relax for a little bit while we wait for feedback from our peers.

But this was not any project, at least not for me. Why do you ask? In general terms, it involved an analysis that you could not search on Google and find the answer for. That is, it involved diving into the unknown!

via GIPHY

The unknown is scary and as the lyrics say:

I’ve had my adventure, I don’t need something new I’m afraid of what I’m risking if I follow you Into the unknown

All of us have been building our careers with other types of data and/or experiments, and taking on a new type of data knowing we had an early access advantage over others was quite the challenge. I don’t know about my co-authors, but maybe some of them shared thoughts like mine that were along the lines: can I do this? can I make it work? do my analysis choices make sense? what will experts think of doing once they have access to this data? All while racing against time, even if it was just an illusion in our minds.

But it’s not my first adventure and I’ve picked up skills and confidence along the way. In particular, I’ve written Bioconductor R packages, dealt with pkgdown/travis issues like #1206, made shiny web applications, analyzed large RNA-seq data, written papers using GoogleDocs, gotten better at asking for help, among other skills.

I’ve also gotten more comfortable with the idea that I can’t do it all. Others will shortly develop new methods for this type of data, or proper infrastructure to handle this data, or faster visualizations, and so goes on the list. But I’m proud and really happy to say that we built quite the robust prototype. Plus maybe we’ll be involved in shaping this future.

And you noticed that I mentioned we. That’s because I have been learning over the years how to foster collaborations. This particular project involved working with two other members of my workplace who are awesome and that I didn’t know that well. It also involved a new collaboration with someone I’ve known for a while now (we initially met through Twitter in 2014) but hadn’t had the chance to work with. Thus we dove into the unknown together 👩‍🚀🧑‍🚀.

via GIPHY

I feel like we complemented each other quite well and all I can confidently say that our new adventure so far has been very stimulating, even it cost me some sleep.

Spatial transcriptomics

So, where does spatial transcriptomics come into play and what does it mean? I work with gene activity data which we formally refer to as gene expression 🧬. That is, we measure 🔍🧮 the activity levels of genes for a particular biological condition or tissue sample. For several years now (about since 2007-2009) we have been able to measure many genes from a tissue sample, called bulk RNA-sequencing and abbreviated as RNA-seq.

That’s great! But biology is complicated and a single tissue sample is composed of multiple cells of various types. For example, in the brain there are cells that send signals around (neurons) and others that give structure to the brain. That is why technologies for measuring the gene expression at the single cell level were developed, abbreviated as scRNA-seq. scRNA-seq has been used widely to study mouse brains to live tissue samples.

In recent years I’ve been working with data from the human brain 🧠. The Lieber Institute for Brain Development has about two thousand brain samples. To preserve them for years to come, the brains are frozen 🥶. Cells are a bit fragile and freezing them breaks them. This fact has made it challenging to study data from frozen human brains. Several of my colleagues work on adapting research protocols to handle frozen human brain tissue. The research field overall has been able to generate single nucleus RNA sequencing (snRNA-seq) data and we are all generating some more.

snRNA-seq and scRNA-seq are great because you can measure what genes (pieces of the cell) are active, classify them into groups, and use prior knowledge to label these groups. However, you lose information about what part of the tissue they come from. That’s where technologies for spatial transcriptomics, that is, measuring gene expression 🧬 as close a possible to the single cell level yet retaining spatial coordinates are being actively developed. Thus, you end up with two main sources of data: the gene expression measurements but also images from the tissue (histology information). My coworkers anticipated what could these technologies be used for and what type of research questions they help us answer.

Our project’s history

My coworkers got early access to a specific new type of spatial transcriptomics technology called Visium from the 10x Genomics company and started piloting it on human brain tissue. They recruited me to their project in early November 2019 (11th) and I recruited more colleagues in early December (4th). Today on February 28th 2020 we made public our research advances, code, and software we built for this project.

Given that we have many potential websites others can find us through, we decided to unify as much as possible the documentation even if that meant repeating it. The basic start of our documentation is included further below.

spatialLIBD

Welcome to the spatialLIBD project! It is composed of:

The web application allows you to browse the LIBD human dorsolateral pre-frontal cortex (DLPFC) spatial transcriptomics data generated with the 10x Genomics Visium platform. Through the R/Bioconductor package you can also download the data as well as visualize your own datasets using this web application. Please check the bioRxiv pre-print for more details about this project.

If you tweet about this website, the data or the R package please use the #spatialLIBD hashtag. You can find previous tweets that way as shown here. Thank you!

Study design

As a quick overview, the data presented here is from portion of the DLPFC that spans six neuronal layers plus white matter (A) for a total of three subjects with two pairs of spatially adjacent replicates (B). Each dissection of DLPFC was designed to span all six layers plus white matter (C). Using this web application you can explore the expression of known genes such as SNAP25 (D, a neuronal gene), MOBP (E, an oligodendrocyte gene), and known layer markers from mouse studies such as PCP4 (F, a known layer 5 marker gene).

R/Bioconductor package

The spatialLIBD package contains functions for:

  • Accessing the spatial transcriptomics data from the LIBD Human Pilot project (code on GitHub) generated with the Visium platform from 10x Genomics. The data is retrieved from Bioconductor’s ExperimentHub.
  • Visualizing the spot-level spatial gene expression data and clusters.
  • Inspecting the data interactively either on your computer or through spatial.libd.org/spatialLIBD/.

For more details, please check the documentation website or the Bioconductor package landing page here.

Installation instructions

Get the latest stable R release from CRAN. Then install spatialLIBD from Bioconductor using the following code:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("spatialLIBD")

Access the data

Through the spatialLIBD package you can access the processed data in it’s final R format. However, we also provide a table of links so you can download the raw data we received from 10x Genomics.

Processed data

Using spatialLIBD you can access the Human DLPFC spatial transcriptomics data from the 10x Genomics Visium platform. For example, this is the code you can use to access the layer-level data. For more details, check the help file for fetch_data().

## Load the package
library('spatialLIBD')

## Download the spot-level data
sce <- fetch_data(type = 'sce')
## Loading objects:
##   sce
## This is a SingleCellExperiment object
sce
## class: SingleCellExperiment 
## dim: 33538 47681 
## metadata(1): image
## assays(2): counts logcounts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
##   ENSG00000268674
## rowData names(9): source type ... gene_search is_top_hvg
## colnames(47681): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##   TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
## colData names(73): barcode sample_name ... pseudobulk_UMAP_spatial
##   markers_UMAP_spatial
## reducedDimNames(6): PCA TSNE_perplexity50 ... TSNE_perplexity80
##   UMAP_neighbors15
## spikeNames(0):
## altExpNames(0):
## Note the memory size
pryr::object_size(sce)
## 2.08 GB
## Remake the logo image with histology information
sce_image_clus(
    sce = sce,
    clustervar = 'layer_guess_reordered',
    sampleid = '151673',
    colors = libd_layer_colors,
    ... = ' DLPFC Human Brain Layers\nMade with github.com/LieberInstitute/spatialLIBD'
)

Citation

Below is the citation output from using citation('spatialLIBD') in R. Please run this yourself to check for any updates on how to cite spatialLIBD.

citation('spatialLIBD')
## 
## Collado-Torres L, Maynard KR, Jaffe AE (2020). _LIBD Visium spatial
## transcriptomics human pilot data inspector_. doi:
## 10.18129/B9.bioc.spatialLIBD (URL:
## https://doi.org/10.18129/B9.bioc.spatialLIBD),
## https://github.com/LieberInstitute/spatialLIBD - R package version
## 0.99.9, <URL: http://www.bioconductor.org/packages/spatialLIBD>.
## 
## Maynard KR, Collado-Torres L, Weber LM, Uytingco C, Barry BK, Williams
## SR, II JLC, Tran MN, Besich Z, Tippani M, Chew J, Yin Y, Kleinman JE,
## Hyde TM, Rao N, Hicks SC, Martinowich K, Jaffe AE (2020).
## "Transcriptome-scale spatial gene expression in the human dorsolateral
## prefrontal cortex." _bioRxiv_. doi: 10.1101/2020.02.28.969931 (URL:
## https://doi.org/10.1101/2020.02.28.969931), <URL:
## https://www.biorxiv.org/content/10.1101/2020.02.28.969931v1>.
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.

Please note that the spatialLIBD was only made possible thanks to many other R and bioinformatics software authors. We have cited their work either in the pre-print or the vignette of the R package.

Closing remarks

Overall, this project has everything that I like: R code, a Bioconductor package, challenging and interest biological data, excellent collaborator team, open communication, and so on.

Now, these are early days of the 10x Genomics Visium platform and there’s much we and others want to learn. So if you have the chance to hear anyone in our team talk more in detail about the project or you simply want to chat with them, here are some opportunities for you to do so as we’d love to collaborate with you or even hire you. Check Stephanie’s tweet and the LIBD career website for more details or simply get in touch with us.

  • Kristen R Maynard and me will present a The Scientist webinar on March 19th
  • Keri Martinowich will be at CVCSN 2020 March 26-27th
  • I’ll present a seminar at LIIGH-UNAM on March 30th
  • Kristen R Maynard will be at the 2020 Single Cell Symposium on April 20th
  • Likely Andrew E Jaffe and others will be at The Biology of Genomes 2020 May 5-9th
  • Stephanie Hicks will present at eRum 2020 May 27-30
  • Likely some of us will attend BioC2020 July 29-31

Finally, here’s the pre-print twitter thread:

Thank you for getting this far!

via GIPHY

References

Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.2 (2019-12-12)
##  os       macOS Catalina 10.15.2      
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2020-02-29                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package                * version  date       lib source                                      
##  AnnotationDbi            1.48.0   2019-10-29 [1] Bioconductor                                
##  AnnotationHub            2.18.0   2019-10-29 [1] Bioconductor                                
##  assertthat               0.2.1    2019-03-21 [1] CRAN (R 3.6.0)                              
##  attempt                  0.3.0    2019-04-08 [1] CRAN (R 3.6.0)                              
##  backports                1.1.5    2019-10-02 [1] CRAN (R 3.6.0)                              
##  beeswarm                 0.2.3    2016-04-25 [1] CRAN (R 3.6.0)                              
##  bibtex                   0.4.2.2  2020-01-02 [1] CRAN (R 3.6.0)                              
##  Biobase                * 2.46.0   2019-10-29 [1] Bioconductor                                
##  BiocFileCache            1.10.2   2019-11-08 [1] Bioconductor                                
##  BiocGenerics           * 0.32.0   2019-10-29 [1] Bioconductor                                
##  BiocManager              1.30.10  2019-11-16 [1] CRAN (R 3.6.1)                              
##  BiocNeighbors            1.4.1    2019-11-01 [1] Bioconductor                                
##  BiocParallel           * 1.20.1   2019-12-21 [1] Bioconductor                                
##  BiocSingular             1.2.2    2020-02-14 [1] Bioconductor                                
##  BiocStyle              * 2.14.4   2020-01-09 [1] Bioconductor                                
##  BiocVersion              3.10.1   2019-06-06 [1] Bioconductor                                
##  bit                      1.1-15.2 2020-02-10 [1] CRAN (R 3.6.0)                              
##  bit64                    0.9-7    2017-05-08 [1] CRAN (R 3.6.0)                              
##  bitops                   1.0-6    2013-08-17 [1] CRAN (R 3.6.0)                              
##  blob                     1.2.1    2020-01-20 [1] CRAN (R 3.6.0)                              
##  blogdown                 0.17     2019-11-13 [1] CRAN (R 3.6.1)                              
##  bookdown                 0.17     2020-01-11 [1] CRAN (R 3.6.0)                              
##  cli                      2.0.1    2020-01-08 [1] CRAN (R 3.6.0)                              
##  codetools                0.2-16   2018-12-24 [1] CRAN (R 3.6.2)                              
##  colorout               * 1.2-1    2019-05-07 [1] Github (jalvesaq/colorout@7ea9440)          
##  colorspace               1.4-1    2019-03-18 [1] CRAN (R 3.6.0)                              
##  cowplot                  1.0.0    2019-07-11 [1] CRAN (R 3.6.0)                              
##  crayon                   1.3.4    2017-09-16 [1] CRAN (R 3.6.0)                              
##  curl                     4.3      2019-12-02 [1] CRAN (R 3.6.0)                              
##  data.table               1.12.8   2019-12-09 [1] CRAN (R 3.6.1)                              
##  DBI                      1.1.0    2019-12-15 [1] CRAN (R 3.6.0)                              
##  dbplyr                   1.4.2    2019-06-17 [1] CRAN (R 3.6.0)                              
##  DelayedArray           * 0.12.2   2020-01-06 [1] Bioconductor                                
##  DelayedMatrixStats       1.8.0    2019-10-29 [1] Bioconductor                                
##  desc                     1.2.0    2018-05-01 [1] CRAN (R 3.6.0)                              
##  digest                   0.6.25   2020-02-23 [1] CRAN (R 3.6.0)                              
##  dotCall64                1.0-0    2018-07-30 [1] CRAN (R 3.6.0)                              
##  dplyr                    0.8.4    2020-01-31 [1] CRAN (R 3.6.0)                              
##  DT                       0.12     2020-02-05 [1] CRAN (R 3.6.0)                              
##  evaluate                 0.14     2019-05-28 [1] CRAN (R 3.6.0)                              
##  ExperimentHub            1.12.0   2019-10-29 [1] Bioconductor                                
##  fansi                    0.4.1    2020-01-08 [1] CRAN (R 3.6.0)                              
##  farver                   2.0.3    2020-01-16 [1] CRAN (R 3.6.0)                              
##  fastmap                  1.0.1    2019-10-08 [1] CRAN (R 3.6.0)                              
##  fields                   10.3     2020-02-04 [1] CRAN (R 3.6.0)                              
##  fs                       1.3.1    2019-05-06 [1] CRAN (R 3.6.0)                              
##  GenomeInfoDb           * 1.22.0   2019-10-29 [1] Bioconductor                                
##  GenomeInfoDbData         1.2.2    2019-10-31 [1] Bioconductor                                
##  GenomicRanges          * 1.38.0   2019-10-29 [1] Bioconductor                                
##  ggbeeswarm               0.6.0    2017-08-07 [1] CRAN (R 3.6.0)                              
##  ggplot2                  3.2.1    2019-08-10 [1] CRAN (R 3.6.0)                              
##  glue                     1.3.1    2019-03-12 [1] CRAN (R 3.6.0)                              
##  golem                    0.1      2019-08-05 [1] CRAN (R 3.6.0)                              
##  gridExtra                2.3      2017-09-09 [1] CRAN (R 3.6.0)                              
##  gtable                   0.3.0    2019-03-25 [1] CRAN (R 3.6.0)                              
##  htmltools                0.4.0    2019-10-04 [1] CRAN (R 3.6.0)                              
##  htmlwidgets              1.5.1    2019-10-08 [1] CRAN (R 3.6.0)                              
##  httpuv                   1.5.2    2019-09-11 [1] CRAN (R 3.6.0)                              
##  httr                     1.4.1    2019-08-05 [1] CRAN (R 3.6.0)                              
##  interactiveDisplayBase   1.24.0   2019-10-29 [1] Bioconductor                                
##  IRanges                * 2.20.2   2020-01-13 [1] Bioconductor                                
##  irlba                    2.3.3    2019-02-05 [1] CRAN (R 3.6.0)                              
##  jsonlite                 1.6.1    2020-02-02 [1] CRAN (R 3.6.0)                              
##  knitcitations          * 1.0.10   2019-09-15 [1] CRAN (R 3.6.0)                              
##  knitr                    1.27     2020-01-16 [1] CRAN (R 3.6.0)                              
##  labeling                 0.3      2014-08-23 [1] CRAN (R 3.6.0)                              
##  later                    1.0.0    2019-10-04 [1] CRAN (R 3.6.0)                              
##  lattice                  0.20-38  2018-11-04 [1] CRAN (R 3.6.2)                              
##  lazyeval                 0.2.2    2019-03-15 [1] CRAN (R 3.6.0)                              
##  lifecycle                0.1.0    2019-08-01 [1] CRAN (R 3.6.0)                              
##  lubridate                1.7.4    2018-04-11 [1] CRAN (R 3.6.0)                              
##  magrittr                 1.5      2014-11-22 [1] CRAN (R 3.6.0)                              
##  maps                     3.3.0    2018-04-03 [1] CRAN (R 3.6.0)                              
##  Matrix                   1.2-18   2019-11-27 [1] CRAN (R 3.6.2)                              
##  matrixStats            * 0.55.0   2019-09-07 [1] CRAN (R 3.6.0)                              
##  memoise                  1.1.0    2017-04-21 [1] CRAN (R 3.6.0)                              
##  mime                     0.9      2020-02-04 [1] CRAN (R 3.6.0)                              
##  munsell                  0.5.0    2018-06-12 [1] CRAN (R 3.6.0)                              
##  pillar                   1.4.3    2019-12-20 [1] CRAN (R 3.6.0)                              
##  pkgconfig                2.0.3    2019-09-22 [1] CRAN (R 3.6.1)                              
##  pkgload                  1.0.2    2018-10-29 [1] CRAN (R 3.6.0)                              
##  plotly                   4.9.2    2020-02-12 [1] CRAN (R 3.6.0)                              
##  plyr                     1.8.5    2019-12-10 [1] CRAN (R 3.6.0)                              
##  png                      0.1-7    2013-12-03 [1] CRAN (R 3.6.0)                              
##  Polychrome               1.2.4    2020-02-03 [1] CRAN (R 3.6.0)                              
##  promises                 1.1.0    2019-10-04 [1] CRAN (R 3.6.0)                              
##  pryr                     0.1.4    2018-02-18 [1] CRAN (R 3.6.0)                              
##  purrr                    0.3.3    2019-10-18 [1] CRAN (R 3.6.0)                              
##  R6                       2.4.1    2019-11-12 [1] CRAN (R 3.6.1)                              
##  rappdirs                 0.3.1    2016-03-28 [1] CRAN (R 3.6.0)                              
##  RColorBrewer             1.1-2    2014-12-07 [1] CRAN (R 3.6.0)                              
##  Rcpp                     1.0.3    2019-11-08 [1] CRAN (R 3.6.0)                              
##  RCurl                    1.98-1.1 2020-01-19 [1] CRAN (R 3.6.0)                              
##  RefManageR               1.2.12   2019-04-03 [1] CRAN (R 3.6.0)                              
##  rlang                    0.4.4    2020-01-28 [1] CRAN (R 3.6.0)                              
##  rmarkdown                2.1      2020-01-20 [1] CRAN (R 3.6.0)                              
##  roxygen2                 7.0.2    2019-12-02 [1] CRAN (R 3.6.0)                              
##  rprojroot                1.3-2    2018-01-03 [1] CRAN (R 3.6.0)                              
##  RSQLite                  2.2.0    2020-01-07 [1] CRAN (R 3.6.0)                              
##  rstudioapi               0.11     2020-02-07 [1] CRAN (R 3.6.0)                              
##  rsvd                     1.0.3    2020-02-17 [1] CRAN (R 3.6.0)                              
##  S4Vectors              * 0.24.3   2020-01-18 [1] Bioconductor                                
##  scales                   1.1.0    2019-11-18 [1] CRAN (R 3.6.1)                              
##  scater                   1.14.6   2019-12-16 [1] Bioconductor                                
##  scatterplot3d            0.3-41   2018-03-14 [1] CRAN (R 3.6.0)                              
##  sessioninfo            * 1.1.1    2018-11-05 [1] CRAN (R 3.6.0)                              
##  shiny                    1.4.0    2019-10-10 [1] CRAN (R 3.6.0)                              
##  shinyWidgets             0.5.0    2019-11-18 [1] CRAN (R 3.6.0)                              
##  SingleCellExperiment   * 1.8.0    2019-10-29 [1] Bioconductor                                
##  spam                     2.5-1    2019-12-12 [1] CRAN (R 3.6.0)                              
##  spatialLIBD            * 0.99.9   2020-02-29 [1] Github (LieberInstitute/spatialLIBD@572e2a0)
##  stringi                  1.4.6    2020-02-17 [1] CRAN (R 3.6.0)                              
##  stringr                  1.4.0    2019-02-10 [1] CRAN (R 3.6.0)                              
##  SummarizedExperiment   * 1.16.1   2019-12-19 [1] Bioconductor                                
##  testthat                 2.3.1    2019-12-01 [1] CRAN (R 3.6.0)                              
##  tibble                   2.1.3    2019-06-06 [1] CRAN (R 3.6.0)                              
##  tidyr                    1.0.2    2020-01-24 [1] CRAN (R 3.6.2)                              
##  tidyselect               1.0.0    2020-01-27 [1] CRAN (R 3.6.0)                              
##  usethis                  1.5.1    2019-07-04 [1] CRAN (R 3.6.0)                              
##  vctrs                    0.2.3    2020-02-20 [1] CRAN (R 3.6.0)                              
##  vipor                    0.4.5    2017-03-22 [1] CRAN (R 3.6.0)                              
##  viridis                  0.5.1    2018-03-29 [1] CRAN (R 3.6.0)                              
##  viridisLite              0.3.0    2018-02-01 [1] CRAN (R 3.6.0)                              
##  withr                    2.1.2    2018-03-15 [1] CRAN (R 3.6.0)                              
##  xfun                     0.12     2020-01-13 [1] CRAN (R 3.6.0)                              
##  xml2                     1.2.2    2019-08-09 [1] CRAN (R 3.6.0)                              
##  xtable                   1.8-4    2019-04-21 [1] CRAN (R 3.6.0)                              
##  XVector                  0.26.0   2019-10-29 [1] Bioconductor                                
##  yaml                     2.2.1    2020-02-01 [1] CRAN (R 3.6.0)                              
##  yesno                    0.1.0    2018-04-14 [1] CRAN (R 3.6.0)                              
##  zlibbioc                 1.32.0   2019-10-29 [1] Bioconductor                                
## 
## [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
comments powered by Disqus

Related