class: center, middle, inverse, title-slide #
Normalization
## Analyzing
scRNA-seq
data with
Bioconductor
for
LCG-EJ-UNAM
March 2020 ###
Leonardo Collado-Torres
### 2020-03-23 --- class: inverse .center[ <a href="https://bioconductor.org/"><img src="https://osca.bioconductor.org/cover.png" style="width: 30%"/></a> <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. <a href='https://clustrmaps.com/site/1b5pl' title='Visit tracker'><img src='//clustrmaps.com/map_v2.png?cl=ffffff&w=150&t=n&d=tq5q8216epOrQBSllNIKhXOHUHi-i38brzUURkQEiXw'/></a> ] .footnote[ Download the materials for this course with `usethis::use_course('lcolladotor/osca_LIIGH_UNAM_2020')` or view online at [**lcolladotor.github.io/osca_LIIGH_UNAM_2020**](http://lcolladotor.github.io/osca_LIIGH_UNAM_2020).] <style type="text/css"> /* From https://github.com/yihui/xaringan/issues/147 */ .scroll-output { height: 80%; overflow-y: scroll; } /* https://stackoverflow.com/questions/50919104/horizontally-scrollable-output-on-xaringan-slides */ pre { max-width: 100%; overflow-x: scroll; } /* From https://github.com/yihui/xaringan/wiki/Font-Size */ .tiny{ font-size: 40% } /* From https://github.com/yihui/xaringan/wiki/Title-slide */ .title-slide { background-image: url(https://raw.githubusercontent.com/Bioconductor/OrchestratingSingleCellAnalysis/master/images/Workflow.png); background-size: 33%; background-position: 0% 100% } </style> --- # Slides by Peter Hickey View them [here](https://docs.google.com/presentation/d/1_tCNLiEsQ_TgsqHHf9_1lzXSaM_LunEHxBq3k130dQI/edit#slide=id.g7cc450648d_0_118) --- # Code and output .scroll-output[ ```r library('scRNAseq') sce.zeisel <- ZeiselBrainData(ensembl = TRUE) ``` ``` ## snapshotDate(): 2019-10-22 ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## loading from cache ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## loading from cache ``` ``` ## see ?scRNAseq and browseVignettes('scRNAseq') for documentation ``` ``` ## loading from cache ``` ``` ## snapshotDate(): 2019-10-29 ``` ``` ## loading from cache ``` ``` ## Warning: Unable to map 1565 of 20006 requested IDs. ``` ```r # Quality control library('scater') is.mito <- which(rowData(sce.zeisel)$featureType == "mito") stats <- perCellQCMetrics(sce.zeisel, subsets = list(Mt = is.mito)) qc <- quickPerCellQC(stats, percent_subsets = c("altexps_ERCC_percent", "subsets_Mt_percent")) sce.zeisel <- sce.zeisel[, !qc$discard] ``` ] --- .scroll-output[ ```r # Library size factors lib.sf.zeisel <- librarySizeFactors(sce.zeisel) # Examine distribution of size factors summary(lib.sf.zeisel) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.1754 0.5682 0.8669 1.0000 1.2758 4.0651 ``` ```r hist(log10(lib.sf.zeisel), xlab = "Log10[Size factor]", col = "grey80") ``` ![](04-normalization_files/figure-html/all_code2-1.png)<!-- --> ```r ls.zeisel <- colSums(counts(sce.zeisel)) plot( ls.zeisel, lib.sf.zeisel, log = "xy", xlab = "Library size", ylab = "Size factor" ) ``` ![](04-normalization_files/figure-html/all_code2-2.png)<!-- --> ] --- # Exercise -- * Are `ls.zeisel` and `lib.sf.zeisel` identical? -- * Are they proportional? -- * Compute `lib.sf.zeisel` manually --- # Solution -- * Check the **Details** at `?scater::librarySizeFactors` -- * Compute the size factors manually ```r ## First compute the sums zeisel_sums <- colSums(counts(sce.zeisel)) identical(zeisel_sums, ls.zeisel) ``` ``` ## [1] TRUE ``` ```r ## Next, make them have unity mean zeisel_size_factors <- zeisel_sums/mean(zeisel_sums) identical(zeisel_size_factors, lib.sf.zeisel) ``` ``` ## [1] TRUE ``` -- * Check the [source code](https://github.com/davismcc/scater/blob/master/R/librarySizeFactors.R) --- .scroll-output[ ```r # Normalization by convolution library('scran') # Pre-clustering set.seed(100) clust.zeisel <- quickCluster(sce.zeisel) # Compute deconvolution size factors deconv.sf.zeisel <- calculateSumFactors(sce.zeisel, clusters = clust.zeisel, min.mean = 0.1) # Examine distribution of size factors summary(deconv.sf.zeisel) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.1282 0.4859 0.8248 1.0000 1.3194 4.6521 ``` ```r hist(log10(deconv.sf.zeisel), xlab = "Log10[Size factor]", col = "grey80") ``` ![](04-normalization_files/figure-html/all_code3-1.png)<!-- --> ```r plot( ls.zeisel, deconv.sf.zeisel, log = "xy", xlab = "Library size", ylab = "Size factor" ) ``` ![](04-normalization_files/figure-html/all_code3-2.png)<!-- --> ] --- # Exercises -- * How many quick clusters did we get? -- * How many cells per quick cluster did we get? -- * How many quick clusters will we get if we set the minimum size to 200? Use 100 as the seed. -- * How many lines do you see? ??? * 12 * From 113 to 325, `sort(table(clust.zeisel))` * 10 `set.seed(100); sort(table(quickCluster(sce.zeisel, min.size = 200)))` * Several ones near the diagonal. They are potentially 7 `table(factor(sce.zeisel$level1class))` --- .scroll-output[ ```r # Library size factors vs. convolution size factors # Colouring points using the supplied cell-types plot( lib.sf.zeisel, deconv.sf.zeisel, xlab = "Library size factor", ylab = "Deconvolution size factor", log = 'xy', pch = 16, col = as.integer(factor(sce.zeisel$level1class)) ) abline(a = 0, b = 1, col = "red") ``` ![](04-normalization_files/figure-html/all_code4-1.png)<!-- --> ] --- class: middle .center[ # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan) and themed with [**xaringanthemer**](https://github.com/gadenbuie/xaringanthemer). This course is based on the book [**Orchestrating Single Cell Analysis with Bioconductor**](https://osca.bioconductor.org/) by [Aaron Lun](https://www.linkedin.com/in/aaron-lun-869b5894/), [Robert Amezquita](https://robertamezquita.github.io/), [Stephanie Hicks](https://www.stephaniehicks.com/) and [Raphael Gottardo](http://rglab.org), plus [**WEHI's scRNA-seq course**](https://drive.google.com/drive/folders/1cn5d-Ey7-kkMiex8-74qxvxtCQT6o72h) by [Peter Hickey](https://www.peterhickey.org/). You can find the files for this course at [lcolladotor/osca_LIIGH_UNAM_2020](https://github.com/lcolladotor/osca_LIIGH_UNAM_2020). Instructor: [**Leonardo Collado-Torres**](http://lcolladotor.github.io/). <a href="https://www.libd.org"><img src="img/LIBD_logo.jpg" style="width: 20%" /></a> ] .footnote[ Download the materials for this course with `usethis::use_course('lcolladotor/osca_LIIGH_UNAM_2020')` or view online at [**lcolladotor.github.io/osca_LIIGH_UNAM_2020**](http://lcolladotor.github.io/osca_LIIGH_UNAM_2020).] --- # R session information .scroll-output[ .tiny[ ```r options(width = 120) sessioninfo::session_info() ``` ``` ## ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 3.6.3 (2020-02-29) ## os macOS Catalina 10.15.3 ## system x86_64, darwin15.6.0 ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz America/New_York ## date 2020-03-23 ## ## ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── ## package * version date lib source ## AnnotationDbi * 1.48.0 2019-10-29 [1] Bioconductor ## AnnotationFilter * 1.10.0 2019-10-29 [1] Bioconductor ## AnnotationHub 2.18.0 2019-10-29 [1] Bioconductor ## askpass 1.1 2019-01-13 [1] CRAN (R 3.6.0) ## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) ## beeswarm 0.2.3 2016-04-25 [1] CRAN (R 3.6.0) ## Biobase * 2.46.0 2019-10-29 [1] Bioconductor ## BiocFileCache 1.10.2 2019-11-08 [1] Bioconductor ## BiocGenerics * 0.32.0 2019-10-29 [1] Bioconductor ## BiocManager 1.30.10 2019-11-16 [1] CRAN (R 3.6.1) ## BiocNeighbors 1.4.2 2020-02-29 [1] Bioconductor ## BiocParallel * 1.20.1 2019-12-21 [1] Bioconductor ## BiocSingular 1.2.2 2020-02-14 [1] Bioconductor ## BiocVersion 3.10.1 2019-06-06 [1] Bioconductor ## biomaRt 2.42.0 2019-10-29 [1] Bioconductor ## Biostrings 2.54.0 2019-10-29 [1] Bioconductor ## bit 1.1-15.2 2020-02-10 [1] CRAN (R 3.6.0) ## bit64 0.9-7 2017-05-08 [1] CRAN (R 3.6.0) ## bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.0) ## blob 1.2.1 2020-01-20 [1] CRAN (R 3.6.0) ## cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0) ## codetools 0.2-16 2018-12-24 [1] CRAN (R 3.6.3) ## colorout * 1.2-1 2019-05-07 [1] Github (jalvesaq/colorout@7ea9440) ## colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0) ## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) ## curl 4.3 2019-12-02 [1] CRAN (R 3.6.0) ## DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.0) ## dbplyr 1.4.2 2019-06-17 [1] CRAN (R 3.6.0) ## DelayedArray * 0.12.2 2020-01-06 [1] Bioconductor ## DelayedMatrixStats 1.8.0 2019-10-29 [1] Bioconductor ## digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0) ## dplyr 0.8.5 2020-03-07 [1] CRAN (R 3.6.0) ## dqrng 0.2.1 2019-05-17 [1] CRAN (R 3.6.0) ## edgeR 3.28.1 2020-02-26 [1] Bioconductor ## ensembldb * 2.10.2 2019-11-20 [1] Bioconductor ## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) ## ExperimentHub 1.12.0 2019-10-29 [1] Bioconductor ## fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0) ## fastmap 1.0.1 2019-10-08 [1] CRAN (R 3.6.0) ## GenomeInfoDb * 1.22.0 2019-10-29 [1] Bioconductor ## GenomeInfoDbData 1.2.2 2019-10-31 [1] Bioconductor ## GenomicAlignments 1.22.1 2019-11-12 [1] Bioconductor ## GenomicFeatures * 1.38.2 2020-02-15 [1] Bioconductor ## GenomicRanges * 1.38.0 2019-10-29 [1] Bioconductor ## ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 3.6.0) ## ggplot2 * 3.3.0 2020-03-05 [1] CRAN (R 3.6.0) ## glue 1.3.2 2020-03-12 [1] CRAN (R 3.6.0) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 3.6.0) ## gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0) ## hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.0) ## htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0) ## httpuv 1.5.2 2019-09-11 [1] CRAN (R 3.6.0) ## httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0) ## igraph 1.2.5 2020-03-19 [1] CRAN (R 3.6.0) ## interactiveDisplayBase 1.24.0 2019-10-29 [1] Bioconductor ## IRanges * 2.20.2 2020-01-13 [1] Bioconductor ## irlba 2.3.3 2019-02-05 [1] CRAN (R 3.6.0) ## knitr 1.28 2020-02-06 [1] CRAN (R 3.6.0) ## later 1.0.0 2019-10-04 [1] CRAN (R 3.6.0) ## lattice 0.20-40 2020-02-19 [1] CRAN (R 3.6.0) ## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0) ## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.0) ## limma 3.42.2 2020-02-03 [1] Bioconductor ## locfit 1.5-9.1 2013-04-20 [1] CRAN (R 3.6.0) ## magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) ## Matrix 1.2-18 2019-11-27 [1] CRAN (R 3.6.3) ## matrixStats * 0.56.0 2020-03-13 [1] CRAN (R 3.6.0) ## memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) ## mime 0.9 2020-02-04 [1] CRAN (R 3.6.0) ## munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0) ## openssl 1.4.1 2019-07-18 [1] CRAN (R 3.6.0) ## pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.0) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1) ## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2) ## progress 1.2.2 2019-05-16 [1] CRAN (R 3.6.0) ## promises 1.1.0 2019-10-04 [1] CRAN (R 3.6.0) ## ProtGenerics 1.18.0 2019-10-29 [1] Bioconductor ## purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.0) ## R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1) ## rappdirs 0.3.1 2016-03-28 [1] CRAN (R 3.6.0) ## Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.0) ## RCurl 1.98-1.1 2020-01-19 [1] CRAN (R 3.6.0) ## rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.0) ## rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0) ## Rsamtools 2.2.3 2020-02-23 [1] Bioconductor ## RSQLite 2.2.0 2020-01-07 [1] CRAN (R 3.6.0) ## rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.0) ## rsvd 1.0.3 2020-02-17 [1] CRAN (R 3.6.0) ## rtracklayer 1.46.0 2019-10-29 [1] Bioconductor ## S4Vectors * 0.24.3 2020-01-18 [1] Bioconductor ## scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.1) ## scater * 1.14.6 2019-12-16 [1] Bioconductor ## scran * 1.14.6 2020-02-03 [1] Bioconductor ## scRNAseq * 2.0.2 2019-11-12 [1] Bioconductor ## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) ## shiny 1.4.0.2 2020-03-13 [1] CRAN (R 3.6.0) ## SingleCellExperiment * 1.8.0 2019-10-29 [1] Bioconductor ## statmod 1.4.34 2020-02-17 [1] CRAN (R 3.6.0) ## stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.0) ## stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) ## SummarizedExperiment * 1.16.1 2019-12-19 [1] Bioconductor ## tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0) ## tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.0) ## vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.0) ## vipor 0.4.5 2017-03-22 [1] CRAN (R 3.6.0) ## viridis 0.5.1 2018-03-29 [1] CRAN (R 3.6.0) ## viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.6.0) ## withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) ## xaringan 0.15 2020-03-04 [1] CRAN (R 3.6.3) ## xaringanthemer * 0.2.0 2020-03-22 [1] Github (gadenbuie/xaringanthemer@460f441) ## xfun 0.12 2020-01-13 [1] CRAN (R 3.6.0) ## XML 3.99-0.3 2020-01-20 [1] CRAN (R 3.6.0) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.0) ## XVector 0.26.0 2019-10-29 [1] Bioconductor ## yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.0) ## zlibbioc 1.32.0 2019-10-29 [1] Bioconductor ## ## [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library ``` ]]