This code is part of a presentation for the Human Cell Atlas - Latin America workshop on scRNA-seq data analysis. The code is based on Peter Hickey’s WEHI scRNA-seq workshop which itself is based on the Orchestrating Single Cell Analysis (OSCA) with Bioconductor book and paper.

You can find the Slides that accompany this code on Google Slides.

1 Install R

If you don’t have the latest R release (currently 4.0), you will need to install it following these instructions http://bioconductor.org/install/#install-R. You might also be interested in this video where I show how to install multiple R versions on macOS and winOS.

Yesterday at @LIBDrstats I explained how to install multiple R & @Bioconductor versions on macOS and winOS using

* CRAN
* RSwitch https://t.co/uJ023n6ZKq by @hrbrmstr
* @rstudio

Useful now that R 4.1 is in “alpha” & BioC 3.13 is almost ready#rstats

📹 https://t.co/FzfXpp6nrU
— 🇲🇽 Leonardo Collado-Torres (@lcolladotor) May 1, 2021

2 Install Bioconductor

Once you have R installed, you can then install Bioconductor.

## Install BiocManager which is the utility for installing 
## Bioconductor packages

## For installing Bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

Once you have BiocManager installed, you should verify that you are indeed using the latest Bioconductor release, which right now is 3.12.

BiocManager::version()

## [1] '3.12'

Next we can install the R/Bioconductor packages we’ll use later in the session.

## Install required packages
BiocManager::install(
    c(
        'SingleCellExperiment',
        # 'usethis',
        # 'here',
        'scran',
        'scater',
        'scRNAseq',
        # 'org.Mm.eg.db',
        'AnnotationHub',
        'ExperimentHub',
        # 'BiocFileCache',
        # 'DropletUtils',
        # 'EnsDb.Hsapiens.v86',
        # 'TENxPBMCData',
        # 'BiocSingular',
        # 'batchelor',
        'uwot',
        'Rtsne',
        # 'pheatmap',
        # 'fossil',
        # 'ggplot2',
        # 'cowplot',
        # 'RColorBrewer',
        # 'plotly',
        # 'iSEE',
        'pryr',
        'sessioninfo'
    )
)

3 Data Infraestructure code

Now we can use the code from https://github.com/lcolladotor/osca_LIIGH_UNAM_2020/blob/master/02-data-infrastructure-and-import.R#L57-L208, which I’ve copy-pasted below. The equivalent version with comments in Spanish is available from https://github.com/ComunidadBioInfo/cdsb2020/blob/master/scRNAseq/02-data-infrastructure-and-import.R#L57-L215.

## ----all_code, cache=TRUE--------------------------------------------------------------------------------------------
library('scRNAseq')

## Loading required package: SingleCellExperiment

## Loading required package: SummarizedExperiment

## Loading required package: MatrixGenerics

## Loading required package: matrixStats

## 
## Attaching package: 'MatrixGenerics'

## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars

## Loading required package: GenomicRanges

## Loading required package: stats4

## Loading required package: BiocGenerics

## Loading required package: parallel

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following object is masked from 'package:base':
## 
##     expand.grid

## Loading required package: IRanges

## Loading required package: GenomeInfoDb

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## 
## Attaching package: 'Biobase'

## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians

## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

sce.416b <- LunSpikeInData(which = "416b")

## using temporary cache /var/folders/cx/n9s558kx6fb7jf5z_pgszgb80000gn/T//RtmpD4KAVX/BiocFileCache

## snapshotDate(): 2020-10-27

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## downloading 1 resources

## retrieving 1 resource

## loading from cache

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## downloading 1 resources

## retrieving 1 resource

## loading from cache

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## downloading 1 resources

## retrieving 1 resource

## loading from cache

## snapshotDate(): 2020-10-27

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## downloading 1 resources

## retrieving 1 resource

## loading from cache

## snapshotDate(): 2020-10-27

## loading from cache

## require("ensembldb")

# Load the SingleCellExperiment package
library('SingleCellExperiment')
# Extract the count matrix from the 416b dataset
counts.416b <- counts(sce.416b)
# Construct a new SCE from the counts matrix
sce <- SingleCellExperiment(assays = list(counts = counts.416b))

# Inspect the object we just created
sce

## class: SingleCellExperiment 
## dim: 46604 192 
## metadata(0):
## assays(1): counts
## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ...
##   ENSMUSG00000095742 CBFB-MYH11-mcherry
## rowData names(0):
## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
##   SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
##   SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1
##   SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
## colData names(0):
## reducedDimNames(0):
## altExpNames(0):

## How big is it?
pryr::object_size(sce)

## Registered S3 method overwritten by 'pryr':
##   method      from
##   print.bytes Rcpp

## 40.1 MB

# Access the counts matrix from the assays slot
# WARNING: This will flood RStudio with output!

# 1. The general method
assay(sce, "counts")[1:6, 1:3]

##                    SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0
##                    SLX-9555.N701_S503.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0
##                    SLX-9555.N701_S504.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0

# 2. The special method for the assay named "counts"
counts(sce)[1:6, 1:3]

##                    SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0
##                    SLX-9555.N701_S503.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0
##                    SLX-9555.N701_S504.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0

sce <- scater::logNormCounts(sce)
# Inspect the object we just updated
sce

## class: SingleCellExperiment 
## dim: 46604 192 
## metadata(0):
## assays(2): counts logcounts
## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ...
##   ENSMUSG00000095742 CBFB-MYH11-mcherry
## rowData names(0):
## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
##   SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
##   SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1
##   SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
## colData names(1): sizeFactor
## reducedDimNames(0):
## altExpNames(0):

## How big is it?
pryr::object_size(sce)

## 112 MB

# 1. The general method
assay(sce, "logcounts")[1:6, 1:3]

##                    SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0
##                    SLX-9555.N701_S503.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0
##                    SLX-9555.N701_S504.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0

# 2. The special method for the assay named "logcounts"
logcounts(sce)[1:6, 1:3]

##                    SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0
##                    SLX-9555.N701_S503.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0
##                    SLX-9555.N701_S504.C89V9ANXX.s_1.r_1
## ENSMUSG00000102693                                    0
## ENSMUSG00000064842                                    0
## ENSMUSG00000051951                                    0
## ENSMUSG00000102851                                    0
## ENSMUSG00000103377                                    0
## ENSMUSG00000104017                                    0

# assign a new entry to assays slot
assay(sce, "counts_100") <- assay(sce, "counts") + 100
# List the assays in the object
assays(sce)

## List of length 3
## names(3): counts logcounts counts_100

assayNames(sce)

## [1] "counts"     "logcounts"  "counts_100"

## How big is it?
pryr::object_size(sce)

## 183 MB

# Extract the sample metadata from the 416b dataset
colData.416b <- colData(sce.416b)
# Add some of the sample metadata to our SCE
colData(sce) <- colData.416b[, c("phenotype", "block")]
# Inspect the object we just updated
sce

## class: SingleCellExperiment 
## dim: 46604 192 
## metadata(0):
## assays(3): counts logcounts counts_100
## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ...
##   ENSMUSG00000095742 CBFB-MYH11-mcherry
## rowData names(0):
## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
##   SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
##   SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1
##   SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
## colData names(2): phenotype block
## reducedDimNames(0):
## altExpNames(0):

# Access the sample metadata from our SCE
colData(sce)

## DataFrame with 192 rows and 2 columns
##                                                    phenotype     block
##                                                  <character> <integer>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1     wild type phenotype  20160113
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1     wild type phenotype  20160113
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1     wild type phenotype  20160113
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1  induced CBFB-MYH11 o..  20160113
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1  induced CBFB-MYH11 o..  20160113
## ...                                                      ...       ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 o..  20160325
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 o..  20160325
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 o..  20160325
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 o..  20160325
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1    wild type phenotype  20160325

# Access a specific column of sample metadata from our SCE
table(sce$block)

## 
## 20160113 20160325 
##       96       96

# Example of function that adds extra fields to colData
sce <- scater::addPerCellQC(sce.416b)
# Access the sample metadata from our updated SCE
colData(sce)

## DataFrame with 192 rows and 18 columns
##                                                  Source Name   cell line
##                                                  <character> <character>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1  SLX-9555.N701_S502.C..        416B
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1  SLX-9555.N701_S503.C..        416B
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1  SLX-9555.N701_S504.C..        416B
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1  SLX-9555.N701_S505.C..        416B
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1  SLX-9555.N701_S506.C..        416B
## ...                                                      ...         ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S505...        416B
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S506...        416B
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S507...        416B
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S508...        416B
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 SLX-11312.N712_S517...        416B
##                                                 cell type
##                                               <character>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1  embryonic stem cell
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1  embryonic stem cell
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1  embryonic stem cell
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1  embryonic stem cell
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1  embryonic stem cell
## ...                                                   ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 embryonic stem cell
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 embryonic stem cell
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 embryonic stem cell
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 embryonic stem cell
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 embryonic stem cell
##                                       single cell well quality
##                                                    <character>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1                        OK
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1                        OK
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1                        OK
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1                        OK
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1                        OK
## ...                                                        ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1                       OK
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1                       OK
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1                       OK
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1                       OK
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1                       OK
##                                                     genotype
##                                                  <character>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1  Doxycycline-inducibl..
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1  Doxycycline-inducibl..
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1  Doxycycline-inducibl..
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1  Doxycycline-inducibl..
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1  Doxycycline-inducibl..
## ...                                                      ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 Doxycycline-inducibl..
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 Doxycycline-inducibl..
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 Doxycycline-inducibl..
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 Doxycycline-inducibl..
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1 Doxycycline-inducibl..
##                                                    phenotype      strain
##                                                  <character> <character>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1     wild type phenotype    B6D2F1-J
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1     wild type phenotype    B6D2F1-J
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1     wild type phenotype    B6D2F1-J
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1  induced CBFB-MYH11 o..    B6D2F1-J
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1  induced CBFB-MYH11 o..    B6D2F1-J
## ...                                                      ...         ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 o..    B6D2F1-J
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 o..    B6D2F1-J
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 o..    B6D2F1-J
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1 induced CBFB-MYH11 o..    B6D2F1-J
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1    wild type phenotype    B6D2F1-J
##                                       spike-in addition     block       sum
##                                             <character> <integer> <numeric>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1          ERCC+SIRV  20160113    865936
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1          ERCC+SIRV  20160113   1076277
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1          ERCC+SIRV  20160113   1180138
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1          ERCC+SIRV  20160113   1342593
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1          ERCC+SIRV  20160113   1668311
## ...                                                 ...       ...       ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1          Premixed  20160325    776622
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1          Premixed  20160325   1299950
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1          Premixed  20160325   1800696
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1          Premixed  20160325     46731
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1          Premixed  20160325   1866692
##                                        detected altexps_ERCC_sum
##                                       <numeric>        <numeric>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1       7618            65278
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1       7521            74748
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1       8306            60878
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1       8143            60073
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1       7154           136810
## ...                                         ...              ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1      8174            61575
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1      8956            94982
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1      9530           113707
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1      6649             7580
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1     10964            48664
##                                       altexps_ERCC_detected
##                                                   <numeric>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1                     39
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1                     40
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1                     42
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1                     42
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1                     44
## ...                                                     ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1                    39
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1                    41
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1                    40
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1                    44
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1                    39
##                                       altexps_ERCC_percent altexps_SIRV_sum
##                                                  <numeric>        <numeric>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1               6.80658            27828
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1               6.28030            39173
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1               4.78949            30058
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1               4.18567            32542
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1               7.28887            71850
## ...                                                    ...              ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1              7.17620            19848
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1              6.65764            31729
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1              5.81467            41116
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1             13.48898             1883
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1              2.51930            16289
##                                       altexps_SIRV_detected
##                                                   <numeric>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1                      7
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1                      7
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1                      7
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1                      7
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1                      7
## ...                                                     ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1                     7
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1                     7
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1                     7
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1                     7
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1                     7
##                                       altexps_SIRV_percent     total
##                                                  <numeric> <numeric>
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1               2.90165    959042
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1               3.29130   1190198
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1               2.36477   1271074
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1               2.26741   1435208
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1               3.82798   1876971
## ...                                                    ...       ...
## SLX-11312.N712_S505.H5H5YBBXX.s_8.r_1             2.313165    858045
## SLX-11312.N712_S506.H5H5YBBXX.s_8.r_1             2.224004   1426661
## SLX-11312.N712_S507.H5H5YBBXX.s_8.r_1             2.102562   1955519
## SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1             3.350892     56194
## SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1             0.843271   1931645

# Inspect the object we just updated
sce

## class: SingleCellExperiment 
## dim: 46604 192 
## metadata(0):
## assays(1): counts
## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ...
##   ENSMUSG00000095742 CBFB-MYH11-mcherry
## rowData names(1): Length
## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
##   SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
##   SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1
##   SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
## colData names(18): Source Name cell line ... altexps_SIRV_percent total
## reducedDimNames(0):
## altExpNames(2): ERCC SIRV

## How big is it?
pryr::object_size(sce)

## 41.4 MB

## Add the lognorm counts again
sce <- scater::logNormCounts(sce)

## How big is it?
pryr::object_size(sce)

## 113 MB

# E.g., subset data to just wild type cells
# Remember, cells are columns of the SCE
sce[, sce$phenotype == "wild type phenotype"]

## class: SingleCellExperiment 
## dim: 46604 96 
## metadata(0):
## assays(2): counts logcounts
## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ...
##   ENSMUSG00000095742 CBFB-MYH11-mcherry
## rowData names(1): Length
## colnames(96): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
##   SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
##   SLX-11312.N712_S504.H5H5YBBXX.s_8.r_1
##   SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
## colData names(19): Source Name cell line ... total sizeFactor
## reducedDimNames(0):
## altExpNames(2): ERCC SIRV

# Access the feature metadata from our SCE
# It's currently empty!
rowData(sce)

## DataFrame with 46604 rows and 1 column
##                       Length
##                    <integer>
## ENSMUSG00000102693      1070
## ENSMUSG00000064842       110
## ENSMUSG00000051951      6094
## ENSMUSG00000102851       480
## ENSMUSG00000103377      2819
## ...                      ...
## ENSMUSG00000094621       121
## ENSMUSG00000098647        99
## ENSMUSG00000096730      3077
## ENSMUSG00000095742       243
## CBFB-MYH11-mcherry      2998

# Example of function that adds extra fields to rowData
sce <- scater::addPerFeatureQC(sce)
# Access the feature metadata from our updated SCE
rowData(sce)

## DataFrame with 46604 rows and 3 columns
##                       Length      mean  detected
##                    <integer> <numeric> <numeric>
## ENSMUSG00000102693      1070 0.0000000  0.000000
## ENSMUSG00000064842       110 0.0000000  0.000000
## ENSMUSG00000051951      6094 0.0000000  0.000000
## ENSMUSG00000102851       480 0.0000000  0.000000
## ENSMUSG00000103377      2819 0.0104167  0.520833
## ...                      ...       ...       ...
## ENSMUSG00000094621       121       0.0         0
## ENSMUSG00000098647        99       0.0         0
## ENSMUSG00000096730      3077       0.0         0
## ENSMUSG00000095742       243       0.0         0
## CBFB-MYH11-mcherry      2998   50375.7       100

## How big is it?
pryr::object_size(sce)

## 114 MB

# Download the relevant Ensembl annotation database
# using AnnotationHub resources
library('AnnotationHub')

## Loading required package: BiocFileCache

## Loading required package: dbplyr

## 
## Attaching package: 'AnnotationHub'

## The following object is masked from 'package:Biobase':
## 
##     cache

ah <- AnnotationHub()

## snapshotDate(): 2020-10-27

query(ah, c("Mus musculus", "Ensembl", "v97"))

## AnnotationHub with 1 record
## # snapshotDate(): 2020-10-27
## # names(): AH73905
## # $dataprovider: Ensembl
## # $species: Mus musculus
## # $rdataclass: EnsDb
## # $rdatadateadded: 2019-05-02
## # $title: Ensembl 97 EnsDb for Mus musculus
## # $description: Gene and protein annotations for Mus musculus based on Ensem...
## # $taxonomyid: 10090
## # $genome: GRCm38
## # $sourcetype: ensembl
## # $sourceurl: http://www.ensembl.org
## # $sourcesize: NA
## # $tags: c("97", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
## #   "Protein", "Transcript") 
## # retrieve record with 'object[["AH73905"]]'

# Annotate each gene with its chromosome location
ensdb <- ah[["AH73905"]]

## loading from cache

chromosome <- mapIds(ensdb,
    keys = rownames(sce),
    keytype = "GENEID",
    column = "SEQNAME")

## Warning: Unable to map 563 of 46604 requested IDs.

rowData(sce)$chromosome <- chromosome

# Access the feature metadata from our updated SCE
rowData(sce)

## DataFrame with 46604 rows and 4 columns
##                       Length      mean  detected  chromosome
##                    <integer> <numeric> <numeric> <character>
## ENSMUSG00000102693      1070 0.0000000  0.000000           1
## ENSMUSG00000064842       110 0.0000000  0.000000           1
## ENSMUSG00000051951      6094 0.0000000  0.000000           1
## ENSMUSG00000102851       480 0.0000000  0.000000           1
## ENSMUSG00000103377      2819 0.0104167  0.520833           1
## ...                      ...       ...       ...         ...
## ENSMUSG00000094621       121       0.0         0  GL456372.1
## ENSMUSG00000098647        99       0.0         0  GL456381.1
## ENSMUSG00000096730      3077       0.0         0  JH584292.1
## ENSMUSG00000095742       243       0.0         0  JH584295.1
## CBFB-MYH11-mcherry      2998   50375.7       100          NA

## How big is it?
pryr::object_size(sce)

## 114 MB

# E.g., subset data to just genes on chromosome 3
# NOTE: which() needed to cope with NA chromosome names
sce[which(rowData(sce)$chromosome == "3"), ]

## class: SingleCellExperiment 
## dim: 2876 192 
## metadata(0):
## assays(2): counts logcounts
## rownames(2876): ENSMUSG00000098982 ENSMUSG00000098307 ...
##   ENSMUSG00000105990 ENSMUSG00000075903
## rowData names(4): Length mean detected chromosome
## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
##   SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
##   SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1
##   SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
## colData names(19): Source Name cell line ... total sizeFactor
## reducedDimNames(0):
## altExpNames(2): ERCC SIRV

# Access the metadata from our SCE
# It's currently empty!
metadata(sce)

## list()

# The metadata slot is Vegas - anything goes
metadata(sce) <- list(favourite_genes = c("Shh", "Nck1", "Diablo"),
    analyst = c("Pete"))

# Access the metadata from our updated SCE
metadata(sce)

## $favourite_genes
## [1] "Shh"    "Nck1"   "Diablo"
## 
## $analyst
## [1] "Pete"

# E.g., add the PCA of logcounts
# NOTE: We'll learn more about PCA later
sce <- scater::runPCA(sce)
# Inspect the object we just updated
sce

## class: SingleCellExperiment 
## dim: 46604 192 
## metadata(2): favourite_genes analyst
## assays(2): counts logcounts
## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ...
##   ENSMUSG00000095742 CBFB-MYH11-mcherry
## rowData names(4): Length mean detected chromosome
## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
##   SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
##   SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1
##   SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
## colData names(19): Source Name cell line ... total sizeFactor
## reducedDimNames(1): PCA
## altExpNames(2): ERCC SIRV

# Access the PCA matrix from the reducedDims slot
reducedDim(sce, "PCA")[1:6, 1:3]

##                                             PC1        PC2        PC3
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1  18.717668  27.598132  -5.939654
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1   2.480705  27.564583  -4.916567
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1  42.034018   7.552435 -12.126964
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1  -8.494303 -31.833727 -15.760853
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 -49.737390  -4.226795  -6.123169
## SLX-9555.N701_S507.C89V9ANXX.s_1.r_1 -44.528081   3.215503 -10.384939

# E.g., add a t-SNE representation of logcounts
# NOTE: We'll learn more about t-SNE later
sce <- scater::runTSNE(sce)
# Inspect the object we just updated
sce

## class: SingleCellExperiment 
## dim: 46604 192 
## metadata(2): favourite_genes analyst
## assays(2): counts logcounts
## rownames(46604): ENSMUSG00000102693 ENSMUSG00000064842 ...
##   ENSMUSG00000095742 CBFB-MYH11-mcherry
## rowData names(4): Length mean detected chromosome
## colnames(192): SLX-9555.N701_S502.C89V9ANXX.s_1.r_1
##   SLX-9555.N701_S503.C89V9ANXX.s_1.r_1 ...
##   SLX-11312.N712_S508.H5H5YBBXX.s_8.r_1
##   SLX-11312.N712_S517.H5H5YBBXX.s_8.r_1
## colData names(19): Source Name cell line ... total sizeFactor
## reducedDimNames(2): PCA TSNE
## altExpNames(2): ERCC SIRV

# Access the t-SNE matrix from the reducedDims slot
head(reducedDim(sce, "TSNE"))

##                                           [,1]       [,2]
## SLX-9555.N701_S502.C89V9ANXX.s_1.r_1  6.117248  2.3252320
## SLX-9555.N701_S503.C89V9ANXX.s_1.r_1  3.223016 -0.7115595
## SLX-9555.N701_S504.C89V9ANXX.s_1.r_1  6.331325  4.9120344
## SLX-9555.N701_S505.C89V9ANXX.s_1.r_1 -1.653247  4.0296996
## SLX-9555.N701_S506.C89V9ANXX.s_1.r_1 -4.892326 -7.0317003
## SLX-9555.N701_S507.C89V9ANXX.s_1.r_1 -3.711729 -7.6197516

# E.g., add a 'manual' UMAP representation of logcounts
# NOTE: We'll learn more about UMAP later and a
#         simpler way to compute it.
u <- uwot::umap(t(logcounts(sce)), n_components = 2)
# Add the UMAP matrix to the reducedDims slot
# Access the UMAP matrix from the reducedDims slot
reducedDim(sce, "UMAP") <- u

# List the dimensionality reduction results stored in # the object
reducedDims(sce)

## List of length 3
## names(3): PCA TSNE UMAP

4 Quality Control

Below I copy-pasted the code from https://github.com/lcolladotor/osca_LIIGH_UNAM_2020/blob/master/03-quality-control.R#L57-L226. The equivalent version with comments in Spanish is available from https://github.com/ComunidadBioInfo/cdsb2020/blob/master/scRNAseq/03-quality-control.R#L57-L233.

## ----all_code, cache=TRUE--------------------------------------------------------------------------------------------
## Data
library('scRNAseq')
sce.416b <- LunSpikeInData(which = "416b")

## snapshotDate(): 2020-10-27

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## loading from cache

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## loading from cache

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## loading from cache

## snapshotDate(): 2020-10-27

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## loading from cache

## snapshotDate(): 2020-10-27

## loading from cache

sce.416b$block <- factor(sce.416b$block)

# Download the relevant Ensembl annotation database
# using AnnotationHub resources
library('AnnotationHub')
ah <- AnnotationHub()

## snapshotDate(): 2020-10-27

query(ah, c("Mus musculus", "Ensembl", "v97"))

## AnnotationHub with 1 record
## # snapshotDate(): 2020-10-27
## # names(): AH73905
## # $dataprovider: Ensembl
## # $species: Mus musculus
## # $rdataclass: EnsDb
## # $rdatadateadded: 2019-05-02
## # $title: Ensembl 97 EnsDb for Mus musculus
## # $description: Gene and protein annotations for Mus musculus based on Ensem...
## # $taxonomyid: 10090
## # $genome: GRCm38
## # $sourcetype: ensembl
## # $sourceurl: http://www.ensembl.org
## # $sourcesize: NA
## # $tags: c("97", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
## #   "Protein", "Transcript") 
## # retrieve record with 'object[["AH73905"]]'

# Annotate each gene with its chromosome location
ens.mm.v97 <- ah[["AH73905"]]

## loading from cache

location <- mapIds(
    ens.mm.v97,
    keys = rownames(sce.416b),
    keytype = "GENEID",
    column = "SEQNAME"
)

## Warning: Unable to map 563 of 46604 requested IDs.

# Identify the mitochondrial genes
is.mito <- which(location == "MT")

library('scater')

## Loading required package: ggplot2

sce.416b <- addPerCellQC(sce.416b,
    subsets = list(Mito = is.mito))


## ----qc_metrics, cache=TRUE, dependson='all_code'--------------------------------------------------------------------
plotColData(sce.416b, x = "block", y = "detected")

plotColData(sce.416b, x = "block", y = "detected") +
    scale_y_log10()

plotColData(sce.416b,
    x = "block",
    y = "detected",
    other_fields = "phenotype") +
    scale_y_log10() +
    facet_wrap( ~ phenotype)

## ----all_code_part2, cache = TRUE, dependson='all_code'--------------------------------------------------------------
# Example thresholds
qc.lib <- sce.416b$sum < 100000
qc.nexprs <- sce.416b$detected < 5000
qc.spike <- sce.416b$altexps_ERCC_percent > 10
qc.mito <- sce.416b$subsets_Mito_percent > 10
discard <- qc.lib | qc.nexprs | qc.spike | qc.mito

# Summarize the number of cells removed for each reason
DataFrame(
    LibSize = sum(qc.lib),
    NExprs = sum(qc.nexprs),
    SpikeProp = sum(qc.spike),
    MitoProp = sum(qc.mito),
    Total = sum(discard)
)

## DataFrame with 1 row and 5 columns
##     LibSize    NExprs SpikeProp  MitoProp     Total
##   <integer> <integer> <integer> <integer> <integer>
## 1         3         0        19        14        33

qc.lib2 <- isOutlier(sce.416b$sum, log = TRUE, type = "lower")
qc.nexprs2 <- isOutlier(sce.416b$detected, log = TRUE,
    type = "lower")
qc.spike2 <- isOutlier(sce.416b$altexps_ERCC_percent,
    type = "higher")
qc.mito2 <- isOutlier(sce.416b$subsets_Mito_percent,
    type = "higher")
discard2 <- qc.lib2 | qc.nexprs2 | qc.spike2 | qc.mito2

# Extract the thresholds
attr(qc.lib2, "thresholds")

##    lower   higher 
## 434082.9      Inf

attr(qc.nexprs2, "thresholds")

##    lower   higher 
## 5231.468      Inf

# Summarize the number of cells removed for each reason.
DataFrame(
    LibSize = sum(qc.lib2),
    NExprs = sum(qc.nexprs2),
    SpikeProp = sum(qc.spike2),
    MitoProp = sum(qc.mito2),
    Total = sum(discard2)
)

## DataFrame with 1 row and 5 columns
##     LibSize    NExprs SpikeProp  MitoProp     Total
##   <integer> <integer> <integer> <integer> <integer>
## 1         4         0         1         2         6

## More checks
plotColData(sce.416b,
    x = "block",
    y = "detected",
    other_fields = "phenotype") +
    scale_y_log10() +
    facet_wrap( ~ phenotype)

batch <- paste0(sce.416b$phenotype, "-", sce.416b$block)
qc.lib3 <- isOutlier(sce.416b$sum,
    log = TRUE,
    type = "lower",
    batch = batch)
qc.nexprs3 <- isOutlier(sce.416b$detected,
    log = TRUE,
    type = "lower",
    batch = batch)
qc.spike3 <- isOutlier(sce.416b$altexps_ERCC_percent,
    type = "higher",
    batch = batch)
qc.mito3 <- isOutlier(sce.416b$subsets_Mito_percent,
    type = "higher",
    batch = batch)
discard3 <- qc.lib3 | qc.nexprs3 | qc.spike3 | qc.mito3

# Extract the thresholds
attr(qc.lib3, "thresholds")

##        induced CBFB-MYH11 oncogene expression-20160113
## lower                                         461073.1
## higher                                             Inf
##        induced CBFB-MYH11 oncogene expression-20160325
## lower                                         399133.7
## higher                                             Inf
##        wild type phenotype-20160113 wild type phenotype-20160325
## lower                      599794.9                     370316.5
## higher                          Inf                          Inf

attr(qc.nexprs3, "thresholds")

##        induced CBFB-MYH11 oncogene expression-20160113
## lower                                          5399.24
## higher                                             Inf
##        induced CBFB-MYH11 oncogene expression-20160325
## lower                                          6519.74
## higher                                             Inf
##        wild type phenotype-20160113 wild type phenotype-20160325
## lower                      7215.887                     7586.402
## higher                          Inf                          Inf

# Summarize the number of cells removed for each reason
DataFrame(
    LibSize = sum(qc.lib3),
    NExprs = sum(qc.nexprs3),
    SpikeProp = sum(qc.spike3),
    MitoProp = sum(qc.mito3),
    Total = sum(discard3)
)

## DataFrame with 1 row and 5 columns
##     LibSize    NExprs SpikeProp  MitoProp     Total
##   <integer> <integer> <integer> <integer> <integer>
## 1         5         4         6         2         9

## ----use_case, cache=TRUE, dependson= c('all_code', 'all_code_part2')------------------------------------------------
sce.grun <- GrunPancreasData()

## snapshotDate(): 2020-10-27

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## downloading 1 resources

## retrieving 1 resource

## loading from cache

## snapshotDate(): 2020-10-27

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## loading from cache

sce.grun <- addPerCellQC(sce.grun)

plotColData(sce.grun, x = "donor", y = "altexps_ERCC_percent")

## Warning: Removed 10 rows containing non-finite values (stat_ydensity).

## Warning: Removed 10 rows containing missing values (position_quasirandom).

discard.ercc <- isOutlier(sce.grun$altexps_ERCC_percent,
    type = "higher",
    batch = sce.grun$donor)

## Warning in .get_med_and_mad(metric, batch = batch, subset = subset,
## share.medians = share.medians, : missing values ignored during outlier detection

discard.ercc2 <- isOutlier(
    sce.grun$altexps_ERCC_percent,
    type = "higher",
    batch = sce.grun$donor,
    subset = sce.grun$donor %in% c("D17", "D2", "D7")
)

## Warning in .get_med_and_mad(metric, batch = batch, subset = subset,
## share.medians = share.medians, : missing values ignored during outlier detection

plotColData(
    sce.grun,
    x = "donor",
    y = "altexps_ERCC_percent",
    colour_by = data.frame(discard = discard.ercc)
)

## Warning: Removed 10 rows containing non-finite values (stat_ydensity).

## Warning: Removed 10 rows containing missing values (position_quasirandom).

plotColData(
    sce.grun,
    x = "donor",
    y = "altexps_ERCC_percent",
    colour_by = data.frame(discard = discard.ercc2)
)

## Warning: Removed 10 rows containing non-finite values (stat_ydensity).

## Warning: Removed 10 rows containing missing values (position_quasirandom).

# Add info about which cells are outliers
sce.416b$discard <- discard2

# Look at this plot for each QC metric
plotColData(
    sce.416b,
    x = "block",
    y = "sum",
    colour_by = "discard",
    other_fields = "phenotype"
) +
    facet_wrap( ~ phenotype) +
    scale_y_log10()

# Another useful diagnostic plot
plotColData(
    sce.416b,
    x = "sum",
    y = "subsets_Mito_percent",
    colour_by = "discard",
    other_fields = c("block", "phenotype")
) +
    facet_grid(block ~ phenotype)

5 Reproducibility information

Below is the R session information used for making this document.

## Reproducibility information
print('Reproducibility information:')

## [1] "Reproducibility information:"

Sys.time()

## [1] "2021-05-02 15:02:20 EDT"

proc.time()

##    user  system elapsed 
## 112.925   9.019 176.469

options(width = 120)
sessioninfo::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                                      
##  version  R version 4.0.5 Patched (2021-03-31 r80247)
##  os       macOS Big Sur 10.16                        
##  system   x86_64, darwin17.0                         
##  ui       X11                                        
##  language (EN)                                       
##  collate  en_US.UTF-8                                
##  ctype    en_US.UTF-8                                
##  tz       America/New_York                           
##  date     2021-05-02                                 
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package                * version  date       lib source                            
##  AnnotationDbi          * 1.52.0   2020-10-27 [1] Bioconductor                      
##  AnnotationFilter       * 1.14.0   2020-10-27 [1] Bioconductor                      
##  AnnotationHub          * 2.22.1   2021-04-16 [1] Bioconductor                      
##  askpass                  1.1      2019-01-13 [1] CRAN (R 4.0.2)                    
##  assertthat               0.2.1    2019-03-21 [1] CRAN (R 4.0.2)                    
##  beachmat                 2.6.4    2020-12-20 [1] Bioconductor                      
##  beeswarm                 0.3.1    2021-03-07 [1] CRAN (R 4.0.2)                    
##  Biobase                * 2.50.0   2020-10-27 [1] Bioconductor                      
##  BiocFileCache          * 1.14.0   2020-10-27 [1] Bioconductor                      
##  BiocGenerics           * 0.36.1   2021-04-16 [1] Bioconductor                      
##  BiocManager              1.30.12  2021-03-28 [1] CRAN (R 4.0.2)                    
##  BiocNeighbors            1.8.2    2020-12-07 [1] Bioconductor                      
##  BiocParallel             1.24.1   2020-11-06 [1] Bioconductor                      
##  BiocSingular             1.6.0    2020-10-27 [1] Bioconductor                      
##  BiocStyle              * 2.18.1   2020-11-24 [1] Bioconductor                      
##  BiocVersion              3.12.0   2020-05-14 [1] Bioconductor                      
##  biomaRt                  2.46.3   2021-02-09 [1] Bioconductor                      
##  Biostrings               2.58.0   2020-10-27 [1] Bioconductor                      
##  bit                      4.0.4    2020-08-04 [1] CRAN (R 4.0.2)                    
##  bit64                    4.0.5    2020-08-30 [1] CRAN (R 4.0.2)                    
##  bitops                   1.0-7    2021-04-24 [1] CRAN (R 4.0.2)                    
##  blob                     1.2.1    2020-01-20 [1] CRAN (R 4.0.2)                    
##  bookdown                 0.22     2021-04-22 [1] CRAN (R 4.0.2)                    
##  bslib                    0.2.4    2021-01-25 [1] CRAN (R 4.0.3)                    
##  cachem                   1.0.4    2021-02-13 [1] CRAN (R 4.0.2)                    
##  cli                      2.5.0    2021-04-26 [1] CRAN (R 4.0.5)                    
##  codetools                0.2-18   2020-11-04 [1] CRAN (R 4.0.5)                    
##  colorout                 1.2-2    2020-11-03 [1] Github (jalvesaq/colorout@726d681)
##  colorspace               2.0-0    2020-11-11 [1] CRAN (R 4.0.2)                    
##  cowplot                  1.1.1    2020-12-30 [1] CRAN (R 4.0.2)                    
##  crayon                   1.4.1    2021-02-08 [1] CRAN (R 4.0.2)                    
##  curl                     4.3.1    2021-04-30 [1] CRAN (R 4.0.2)                    
##  DBI                      1.1.1    2021-01-15 [1] CRAN (R 4.0.2)                    
##  dbplyr                 * 2.1.1    2021-04-06 [1] CRAN (R 4.0.2)                    
##  DelayedArray             0.16.3   2021-03-24 [1] Bioconductor                      
##  DelayedMatrixStats       1.12.3   2021-02-03 [1] Bioconductor                      
##  digest                   0.6.27   2020-10-24 [1] CRAN (R 4.0.2)                    
##  dplyr                    1.0.5    2021-03-05 [1] CRAN (R 4.0.2)                    
##  ellipsis                 0.3.2    2021-04-29 [1] CRAN (R 4.0.2)                    
##  ensembldb              * 2.14.1   2021-04-19 [1] Bioconductor                      
##  evaluate                 0.14     2019-05-28 [1] CRAN (R 4.0.1)                    
##  ExperimentHub            1.16.1   2021-04-16 [1] Bioconductor                      
##  fansi                    0.4.2    2021-01-15 [1] CRAN (R 4.0.2)                    
##  farver                   2.1.0    2021-02-28 [1] CRAN (R 4.0.2)                    
##  fastmap                  1.1.0    2021-01-25 [1] CRAN (R 4.0.2)                    
##  FNN                      1.1.3    2019-02-15 [1] CRAN (R 4.0.2)                    
##  generics                 0.1.0    2020-10-31 [1] CRAN (R 4.0.2)                    
##  GenomeInfoDb           * 1.26.7   2021-04-08 [1] Bioconductor                      
##  GenomeInfoDbData         1.2.4    2020-11-03 [1] Bioconductor                      
##  GenomicAlignments        1.26.0   2020-10-27 [1] Bioconductor                      
##  GenomicFeatures        * 1.42.3   2021-04-04 [1] Bioconductor                      
##  GenomicRanges          * 1.42.0   2020-10-27 [1] Bioconductor                      
##  ggbeeswarm               0.6.0    2017-08-07 [1] CRAN (R 4.0.2)                    
##  ggplot2                * 3.3.3    2020-12-30 [1] CRAN (R 4.0.2)                    
##  glue                     1.4.2    2020-08-27 [1] CRAN (R 4.0.2)                    
##  gridExtra                2.3      2017-09-09 [1] CRAN (R 4.0.2)                    
##  gtable                   0.3.0    2019-03-25 [1] CRAN (R 4.0.2)                    
##  highr                    0.9      2021-04-16 [1] CRAN (R 4.0.2)                    
##  hms                      1.0.0    2021-01-13 [1] CRAN (R 4.0.2)                    
##  htmltools                0.5.1.1  2021-01-22 [1] CRAN (R 4.0.2)                    
##  httpuv                   1.6.0    2021-04-23 [1] CRAN (R 4.0.2)                    
##  httr                     1.4.2    2020-07-20 [1] CRAN (R 4.0.2)                    
##  interactiveDisplayBase   1.28.0   2020-10-27 [1] Bioconductor                      
##  IRanges                * 2.24.1   2020-12-12 [1] Bioconductor                      
##  irlba                    2.3.3    2019-02-05 [1] CRAN (R 4.0.2)                    
##  jquerylib                0.1.4    2021-04-26 [1] CRAN (R 4.0.5)                    
##  jsonlite                 1.7.2    2020-12-09 [1] CRAN (R 4.0.2)                    
##  knitr                    1.33     2021-04-24 [1] CRAN (R 4.0.2)                    
##  labeling                 0.4.2    2020-10-20 [1] CRAN (R 4.0.2)                    
##  later                    1.2.0    2021-04-23 [1] CRAN (R 4.0.2)                    
##  lattice                  0.20-44  2021-05-02 [1] CRAN (R 4.0.5)                    
##  lazyeval                 0.2.2    2019-03-15 [1] CRAN (R 4.0.2)                    
##  lifecycle                1.0.0    2021-02-15 [1] CRAN (R 4.0.2)                    
##  magrittr                 2.0.1    2020-11-17 [1] CRAN (R 4.0.2)                    
##  Matrix                   1.3-2    2021-01-06 [1] CRAN (R 4.0.5)                    
##  MatrixGenerics         * 1.2.1    2021-01-30 [1] Bioconductor                      
##  matrixStats            * 0.58.0   2021-01-29 [1] CRAN (R 4.0.2)                    
##  memoise                  2.0.0    2021-01-26 [1] CRAN (R 4.0.3)                    
##  mime                     0.10     2021-02-13 [1] CRAN (R 4.0.2)                    
##  munsell                  0.5.0    2018-06-12 [1] CRAN (R 4.0.2)                    
##  openssl                  1.4.4    2021-04-30 [1] CRAN (R 4.0.2)                    
##  pillar                   1.6.0    2021-04-13 [1] CRAN (R 4.0.2)                    
##  pkgconfig                2.0.3    2019-09-22 [1] CRAN (R 4.0.2)                    
##  prettyunits              1.1.1    2020-01-24 [1] CRAN (R 4.0.2)                    
##  progress                 1.2.2    2019-05-16 [1] CRAN (R 4.0.2)                    
##  promises                 1.2.0.1  2021-02-11 [1] CRAN (R 4.0.2)                    
##  ProtGenerics             1.22.0   2020-10-27 [1] Bioconductor                      
##  pryr                     0.1.4    2018-02-18 [1] CRAN (R 4.0.2)                    
##  purrr                    0.3.4    2020-04-17 [1] CRAN (R 4.0.2)                    
##  R6                       2.5.0    2020-10-28 [1] CRAN (R 4.0.2)                    
##  rappdirs                 0.3.3    2021-01-31 [1] CRAN (R 4.0.2)                    
##  Rcpp                     1.0.6    2021-01-15 [1] CRAN (R 4.0.2)                    
##  RCurl                    1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2)                    
##  rlang                    0.4.11   2021-04-30 [1] CRAN (R 4.0.2)                    
##  rmarkdown                2.7      2021-02-19 [1] CRAN (R 4.0.4)                    
##  Rsamtools                2.6.0    2020-10-27 [1] Bioconductor                      
##  RSpectra                 0.16-0   2019-12-01 [1] CRAN (R 4.0.2)                    
##  RSQLite                  2.2.7    2021-04-22 [1] CRAN (R 4.0.2)                    
##  rsvd                     1.0.5    2021-04-16 [1] CRAN (R 4.0.2)                    
##  rtracklayer              1.50.0   2020-10-27 [1] Bioconductor                      
##  Rtsne                    0.15     2018-11-10 [1] CRAN (R 4.0.2)                    
##  S4Vectors              * 0.28.1   2020-12-09 [1] Bioconductor                      
##  sass                     0.3.1    2021-01-24 [1] CRAN (R 4.0.2)                    
##  scales                   1.1.1    2020-05-11 [1] CRAN (R 4.0.2)                    
##  scater                 * 1.18.6   2021-02-26 [1] Bioconductor                      
##  scRNAseq               * 2.4.0    2020-11-09 [1] Bioconductor                      
##  scuttle                  1.0.4    2020-12-17 [1] Bioconductor                      
##  sessioninfo              1.1.1    2018-11-05 [1] CRAN (R 4.0.2)                    
##  shiny                    1.6.0    2021-01-25 [1] CRAN (R 4.0.3)                    
##  SingleCellExperiment   * 1.12.0   2020-10-27 [1] Bioconductor                      
##  sparseMatrixStats        1.2.1    2021-02-02 [1] Bioconductor                      
##  stringi                  1.5.3    2020-09-09 [1] CRAN (R 4.0.2)                    
##  stringr                  1.4.0    2019-02-10 [1] CRAN (R 4.0.2)                    
##  SummarizedExperiment   * 1.20.0   2020-10-27 [1] Bioconductor                      
##  tibble                   3.1.1    2021-04-18 [1] CRAN (R 4.0.2)                    
##  tidyselect               1.1.1    2021-04-30 [1] CRAN (R 4.0.2)                    
##  utf8                     1.2.1    2021-03-12 [1] CRAN (R 4.0.2)                    
##  uwot                     0.1.10   2020-12-15 [1] CRAN (R 4.0.2)                    
##  vctrs                    0.3.8    2021-04-29 [1] CRAN (R 4.0.2)                    
##  vipor                    0.4.5    2017-03-22 [1] CRAN (R 4.0.2)                    
##  viridis                  0.6.0    2021-04-15 [1] CRAN (R 4.0.2)                    
##  viridisLite              0.4.0    2021-04-13 [1] CRAN (R 4.0.2)                    
##  withr                    2.4.2    2021-04-18 [1] CRAN (R 4.0.2)                    
##  xfun                     0.22     2021-03-11 [1] CRAN (R 4.0.2)                    
##  XML                      3.99-0.6 2021-03-16 [1] CRAN (R 4.0.2)                    
##  xml2                     1.3.2    2020-04-23 [1] CRAN (R 4.0.2)                    
##  xtable                   1.8-4    2019-04-21 [1] CRAN (R 4.0.2)                    
##  XVector                  0.30.0   2020-10-28 [1] Bioconductor                      
##  yaml                     2.2.1    2020-02-01 [1] CRAN (R 4.0.2)                    
##  zlibbioc                 1.36.0   2020-10-28 [1] Bioconductor                      
## 
## [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Code for making the R file.

knitr::knit("2021-05-03_code_HCA_LA.Rmd", tangle = TRUE)

Getting started with scRNA-seq analyses with Bioconductor

5/2/2021

Contents

1 Install R

2 Install Bioconductor

3 Data Infraestructure code

4 Quality Control

5 Reproducibility information