+ - 0:00:00
Notes for current slide
Notes for next slide

Normalization

Analyzing scRNA-seq data with Bioconductor for LCG-EJ-UNAM March 2020

Leonardo Collado-Torres

2020-03-23

1 / 12

Download the materials for this course with usethis::use_course('lcolladotor/osca_LIIGH_UNAM_2020') or view online at lcolladotor.github.io/osca_LIIGH_UNAM_2020.

2 / 12

Slides by Peter Hickey

View them here

3 / 12

Code and output

library('scRNAseq')
sce.zeisel <- ZeiselBrainData(ensembl = TRUE)
## snapshotDate(): 2019-10-22
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
## see ?scRNAseq and browseVignettes('scRNAseq') for documentation
## loading from cache
## snapshotDate(): 2019-10-29
## loading from cache
## Warning: Unable to map 1565 of 20006 requested IDs.
# Quality control
library('scater')
is.mito <- which(rowData(sce.zeisel)$featureType == "mito")
stats <- perCellQCMetrics(sce.zeisel, subsets = list(Mt = is.mito))
qc <-
quickPerCellQC(stats,
percent_subsets = c("altexps_ERCC_percent", "subsets_Mt_percent"))
sce.zeisel <- sce.zeisel[, !qc$discard]
4 / 12
# Library size factors
lib.sf.zeisel <- librarySizeFactors(sce.zeisel)
# Examine distribution of size factors
summary(lib.sf.zeisel)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1754 0.5682 0.8669 1.0000 1.2758 4.0651
hist(log10(lib.sf.zeisel), xlab = "Log10[Size factor]", col = "grey80")

ls.zeisel <- colSums(counts(sce.zeisel))
plot(
ls.zeisel,
lib.sf.zeisel,
log = "xy",
xlab = "Library size",
ylab = "Size factor"
)

5 / 12

Exercise

6 / 12

Exercise

  • Are ls.zeisel and lib.sf.zeisel identical?
6 / 12

Exercise

  • Are ls.zeisel and lib.sf.zeisel identical?

  • Are they proportional?

6 / 12

Exercise

  • Are ls.zeisel and lib.sf.zeisel identical?

  • Are they proportional?

  • Compute lib.sf.zeisel manually

6 / 12

Solution

7 / 12

Solution

  • Check the Details at ?scater::librarySizeFactors
7 / 12

Solution

  • Check the Details at ?scater::librarySizeFactors

  • Compute the size factors manually

## First compute the sums
zeisel_sums <- colSums(counts(sce.zeisel))
identical(zeisel_sums, ls.zeisel)
## [1] TRUE
## Next, make them have unity mean
zeisel_size_factors <- zeisel_sums/mean(zeisel_sums)
identical(zeisel_size_factors, lib.sf.zeisel)
## [1] TRUE
7 / 12

Solution

  • Check the Details at ?scater::librarySizeFactors

  • Compute the size factors manually

## First compute the sums
zeisel_sums <- colSums(counts(sce.zeisel))
identical(zeisel_sums, ls.zeisel)
## [1] TRUE
## Next, make them have unity mean
zeisel_size_factors <- zeisel_sums/mean(zeisel_sums)
identical(zeisel_size_factors, lib.sf.zeisel)
## [1] TRUE
7 / 12
# Normalization by convolution
library('scran')
# Pre-clustering
set.seed(100)
clust.zeisel <- quickCluster(sce.zeisel)
# Compute deconvolution size factors
deconv.sf.zeisel <-
calculateSumFactors(sce.zeisel, clusters = clust.zeisel, min.mean = 0.1)
# Examine distribution of size factors
summary(deconv.sf.zeisel)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1282 0.4859 0.8248 1.0000 1.3194 4.6521
hist(log10(deconv.sf.zeisel), xlab = "Log10[Size factor]",
col = "grey80")

plot(
ls.zeisel,
deconv.sf.zeisel,
log = "xy",
xlab = "Library size",
ylab = "Size factor"
)

8 / 12

Exercises

9 / 12

Exercises

  • How many quick clusters did we get?
9 / 12

Exercises

  • How many quick clusters did we get?

  • How many cells per quick cluster did we get?

9 / 12

Exercises

  • How many quick clusters did we get?

  • How many cells per quick cluster did we get?

  • How many quick clusters will we get if we set the minimum size to 200? Use 100 as the seed.

9 / 12

Exercises

  • How many quick clusters did we get?

  • How many cells per quick cluster did we get?

  • How many quick clusters will we get if we set the minimum size to 200? Use 100 as the seed.

  • How many lines do you see?

9 / 12
  • 12
  • From 113 to 325, sort(table(clust.zeisel))
  • 10 set.seed(100); sort(table(quickCluster(sce.zeisel, min.size = 200)))
  • Several ones near the diagonal. They are potentially 7 table(factor(sce.zeisel$level1class))
# Library size factors vs. convolution size factors
# Colouring points using the supplied cell-types
plot(
lib.sf.zeisel,
deconv.sf.zeisel,
xlab = "Library size factor",
ylab = "Deconvolution size factor",
log = 'xy',
pch = 16,
col = as.integer(factor(sce.zeisel$level1class))
)
abline(a = 0, b = 1, col = "red")

10 / 12

Thanks!

Slides created via the R package xaringan and themed with xaringanthemer.

This course is based on the book Orchestrating Single Cell Analysis with Bioconductor by Aaron Lun, Robert Amezquita, Stephanie Hicks and Raphael Gottardo, plus WEHI's scRNA-seq course by Peter Hickey.

You can find the files for this course at lcolladotor/osca_LIIGH_UNAM_2020.

Instructor: Leonardo Collado-Torres.

Download the materials for this course with usethis::use_course('lcolladotor/osca_LIIGH_UNAM_2020') or view online at lcolladotor.github.io/osca_LIIGH_UNAM_2020.

11 / 12

R session information

options(width = 120)
sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 3.6.3 (2020-02-29)
## os macOS Catalina 10.15.3
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/New_York
## date 2020-03-23
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
## package * version date lib source
## AnnotationDbi * 1.48.0 2019-10-29 [1] Bioconductor
## AnnotationFilter * 1.10.0 2019-10-29 [1] Bioconductor
## AnnotationHub 2.18.0 2019-10-29 [1] Bioconductor
## askpass 1.1 2019-01-13 [1] CRAN (R 3.6.0)
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
## beeswarm 0.2.3 2016-04-25 [1] CRAN (R 3.6.0)
## Biobase * 2.46.0 2019-10-29 [1] Bioconductor
## BiocFileCache 1.10.2 2019-11-08 [1] Bioconductor
## BiocGenerics * 0.32.0 2019-10-29 [1] Bioconductor
## BiocManager 1.30.10 2019-11-16 [1] CRAN (R 3.6.1)
## BiocNeighbors 1.4.2 2020-02-29 [1] Bioconductor
## BiocParallel * 1.20.1 2019-12-21 [1] Bioconductor
## BiocSingular 1.2.2 2020-02-14 [1] Bioconductor
## BiocVersion 3.10.1 2019-06-06 [1] Bioconductor
## biomaRt 2.42.0 2019-10-29 [1] Bioconductor
## Biostrings 2.54.0 2019-10-29 [1] Bioconductor
## bit 1.1-15.2 2020-02-10 [1] CRAN (R 3.6.0)
## bit64 0.9-7 2017-05-08 [1] CRAN (R 3.6.0)
## bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.0)
## blob 1.2.1 2020-01-20 [1] CRAN (R 3.6.0)
## cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0)
## codetools 0.2-16 2018-12-24 [1] CRAN (R 3.6.3)
## colorout * 1.2-1 2019-05-07 [1] Github (jalvesaq/colorout@7ea9440)
## colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
## curl 4.3 2019-12-02 [1] CRAN (R 3.6.0)
## DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.0)
## dbplyr 1.4.2 2019-06-17 [1] CRAN (R 3.6.0)
## DelayedArray * 0.12.2 2020-01-06 [1] Bioconductor
## DelayedMatrixStats 1.8.0 2019-10-29 [1] Bioconductor
## digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
## dplyr 0.8.5 2020-03-07 [1] CRAN (R 3.6.0)
## dqrng 0.2.1 2019-05-17 [1] CRAN (R 3.6.0)
## edgeR 3.28.1 2020-02-26 [1] Bioconductor
## ensembldb * 2.10.2 2019-11-20 [1] Bioconductor
## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
## ExperimentHub 1.12.0 2019-10-29 [1] Bioconductor
## fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
## fastmap 1.0.1 2019-10-08 [1] CRAN (R 3.6.0)
## GenomeInfoDb * 1.22.0 2019-10-29 [1] Bioconductor
## GenomeInfoDbData 1.2.2 2019-10-31 [1] Bioconductor
## GenomicAlignments 1.22.1 2019-11-12 [1] Bioconductor
## GenomicFeatures * 1.38.2 2020-02-15 [1] Bioconductor
## GenomicRanges * 1.38.0 2019-10-29 [1] Bioconductor
## ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 3.6.0)
## ggplot2 * 3.3.0 2020-03-05 [1] CRAN (R 3.6.0)
## glue 1.3.2 2020-03-12 [1] CRAN (R 3.6.0)
## gridExtra 2.3 2017-09-09 [1] CRAN (R 3.6.0)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
## hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.0)
## htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
## httpuv 1.5.2 2019-09-11 [1] CRAN (R 3.6.0)
## httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
## igraph 1.2.5 2020-03-19 [1] CRAN (R 3.6.0)
## interactiveDisplayBase 1.24.0 2019-10-29 [1] Bioconductor
## IRanges * 2.20.2 2020-01-13 [1] Bioconductor
## irlba 2.3.3 2019-02-05 [1] CRAN (R 3.6.0)
## knitr 1.28 2020-02-06 [1] CRAN (R 3.6.0)
## later 1.0.0 2019-10-04 [1] CRAN (R 3.6.0)
## lattice 0.20-40 2020-02-19 [1] CRAN (R 3.6.0)
## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.0)
## limma 3.42.2 2020-02-03 [1] Bioconductor
## locfit 1.5-9.1 2013-04-20 [1] CRAN (R 3.6.0)
## magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
## Matrix 1.2-18 2019-11-27 [1] CRAN (R 3.6.3)
## matrixStats * 0.56.0 2020-03-13 [1] CRAN (R 3.6.0)
## memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
## mime 0.9 2020-02-04 [1] CRAN (R 3.6.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
## openssl 1.4.1 2019-07-18 [1] CRAN (R 3.6.0)
## pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
## progress 1.2.2 2019-05-16 [1] CRAN (R 3.6.0)
## promises 1.1.0 2019-10-04 [1] CRAN (R 3.6.0)
## ProtGenerics 1.18.0 2019-10-29 [1] Bioconductor
## purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
## R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
## rappdirs 0.3.1 2016-03-28 [1] CRAN (R 3.6.0)
## Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.0)
## RCurl 1.98-1.1 2020-01-19 [1] CRAN (R 3.6.0)
## rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.0)
## rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0)
## Rsamtools 2.2.3 2020-02-23 [1] Bioconductor
## RSQLite 2.2.0 2020-01-07 [1] CRAN (R 3.6.0)
## rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.0)
## rsvd 1.0.3 2020-02-17 [1] CRAN (R 3.6.0)
## rtracklayer 1.46.0 2019-10-29 [1] Bioconductor
## S4Vectors * 0.24.3 2020-01-18 [1] Bioconductor
## scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.1)
## scater * 1.14.6 2019-12-16 [1] Bioconductor
## scran * 1.14.6 2020-02-03 [1] Bioconductor
## scRNAseq * 2.0.2 2019-11-12 [1] Bioconductor
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
## shiny 1.4.0.2 2020-03-13 [1] CRAN (R 3.6.0)
## SingleCellExperiment * 1.8.0 2019-10-29 [1] Bioconductor
## statmod 1.4.34 2020-02-17 [1] CRAN (R 3.6.0)
## stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.0)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
## SummarizedExperiment * 1.16.1 2019-12-19 [1] Bioconductor
## tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
## tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.0)
## vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.0)
## vipor 0.4.5 2017-03-22 [1] CRAN (R 3.6.0)
## viridis 0.5.1 2018-03-29 [1] CRAN (R 3.6.0)
## viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.6.0)
## withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
## xaringan 0.15 2020-03-04 [1] CRAN (R 3.6.3)
## xaringanthemer * 0.2.0 2020-03-22 [1] Github (gadenbuie/xaringanthemer@460f441)
## xfun 0.12 2020-01-13 [1] CRAN (R 3.6.0)
## XML 3.99-0.3 2020-01-20 [1] CRAN (R 3.6.0)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.0)
## XVector 0.26.0 2019-10-29 [1] Bioconductor
## yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.0)
## zlibbioc 1.32.0 2019-10-29 [1] Bioconductor
##
## [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
12 / 12

Download the materials for this course with usethis::use_course('lcolladotor/osca_LIIGH_UNAM_2020') or view online at lcolladotor.github.io/osca_LIIGH_UNAM_2020.

2 / 12
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow