Benchmark of cellular deconvolution methods using a multi-assay reference dataset from postmortem human prefrontal cortex

Louise A. Huuki-Myers, Kelsey D. Montgomery __*__, Sang Ho Kwon, Sophia Cinquemani, Nicholas J. Eagles, Daianna Gonzalez-Padilla, Sean K. Maden, Joel E. Kleinman, Thomas M. Hyde, Stephanie C. Hicks, Kristen R. Maynard &dagger;, Leonardo Collado-Torres

February, 2024

Image credit: bioRxiv

Abstract

Background Cellular deconvolution of bulk RNA-sequencing (RNA-seq) data using single cell or nuclei RNA-seq (sc/snRNA-seq) reference data is an important strategy for estimating cell type composition in heterogeneous tissues, such as human brain. Computational methods for deconvolution have been developed and benchmarked against simulated data, pseudobulked sc/snRNA-seq data, or immunohistochemistry reference data. A major limitation in developing improved deconvolution algorithms has been the lack of integrated datasets with orthogonal measurements of gene expression and estimates of cell type proportions on the same tissue sample. Deconvolution algorithm performance has not yet been evaluated across different RNA extraction methods (cytosolic, nuclear, or whole cell RNA), different library preparation types (mRNA enrichment vs. ribosomal RNA depletion), or with matched single cell reference datasets. Results A rich multi-assay dataset was generated in postmortem human dorsolateral prefrontal cortex (DLPFC) from 22 tissue blocks. Assays included spatially-resolved transcriptomics, snRNA-seq, bulk RNA-seq (across six library/extraction RNA-seq combinations), and RNAScope/Immunofluorescence (RNAScope/IF) for six broad cell types. The Mean Ratio method, implemented in the DeconvoBuddies R package, was developed for selecting cell type marker genes. Six computational deconvolution algorithms were evaluated in DLPFC and predicted cell type proportions were compared to orthogonal RNAScope/IF measurements. Conclusions Bisque and hspe were the most accurate methods, were robust to differences in RNA library types and extractions. This multi-assay dataset showed that cell size differences, marker genes differentially quantified across RNA libraries, and cell composition variability in reference snRNA-seq impact the accuracy of current deconvolution methods.

Type

Preprint

Publication

bioRxiv

Hey #deconvolution fans! 👀 We’ve got an exciting new pre-print for you: a benchmark of 6 popular deconvolution methods on a multimodal human brain #DLPFC dataset 🧠🧬 @LieberInstitute @jhubiostat #scitwitter
📎 https://t.co/4KXibeerZi pic.twitter.com/0O8E5emIeP
— Louise Huuki-Myers (@lahuuki) April 15, 2024