This repository contains the files for creating our video for the Bioinformatics Peer Prize III hosted by Thinkable. The video was made using ari. The slides themselves were created with PowerPoint and then exported as PNG images. If you are interested in the slides, you can find most of them in this recount workshop presentation for the BioC2017 meeting.
Check our full entry here.
You can view the video we uploaded at lcolladotor.github.io/biopeerprize2018/output.mp4 or simply view it here.
If you prefer to see the video with another voice, check the one made with voice = "Joanna"
. Most of the people we asked preferred this voice, which is the one we use in the final entry.
The following code setups the text for each of the slides and then creates the video using ari::ari_spin()
. The code is fairly simple as you need to specify a character vector of length equal to the number of slides you are going to show. Further details on how to setup ari are available in that package’s README file. Note that you need to enable polly
for your AWS account.
library('ari')
ari_text <- c(
"Welcome to this short presentation about the recount2 project! This is our entry for the 2018 Bioinformatics Peer Prize hosted by Thinkable. Our entry paper is called Reproducible RNA-seq analysis using recount2 and was published on April 2017 with the DOI 10.1038/nbt.3838.",
"In RNA sequencing projects we start with millions of short reads illustrated as pink boxes here. A priori we do not know which part of the reference genome they come from, so for years researchers have been aligning these short reads to the genome. Once we know what parts of the genome are active, that is, the parts of the genome that are expressed, then we can compare across conditions that help us further our understanding of molecular biology.",
"RNA sequencing aligners can find the positions of the genome where these reads come from. In some cases, some reads are split in two locations shown by a long black line. Other times, only a portion of the read is used shown as purple boxes. This is just to show that this process is complex process that involves a substantial amount of computing power.",
"Some recent large RNA sequencing projects include the GTEx and the TCGA consortiums where about ten thousand samples have been sequenced in each project. GTEx focuses on studying expression across tissues for mostly healthy individuals and TCGA studies RNA expression in different cancers. Both are very interesting projects yet are time consuming to run through RNA sequencing aligners.",
"When most sequencing projects are completed they are shared via the Sequence Read Archive. From this website other researchers can access the data and use it to further their studies",
"This graph shows that the size of the SRA has been increasing very rapidly over the years. That is a lot of public data!",
"Members from our team developed a RNA sequencing aligner that can process many samples at a time and produce reduced representations of the data. This program is called Rail-RNA.",
"Rail-RNA is designed to run on the computing cloud such as on Amazon Web Services computers. This allowed us to align all the public human RNA sequencing data we could find including the Sequence Read Archive, the GTEx and the TCGA projects. We uniformly processed over seventy thousand samples from over two thousand studies that contained over 4.4 trillion RNA sequencing reads.",
"Once we aligned this massive data, we produced a resource that we call recount2. The data is formatted in a way that it can be used directly by a bioinformatician, specially when combining the data with tools from the Bioconductor project.",
"From recount2, anyone can download summarized gene expression data at different feature types including genes, exons, exon-exon junctions and base pair coverage data. The resource also allows others to build new methods around it, such as for quantifying expressed transcripts or defining expressed regions.",
"The data is shared in a compact way that combines the information of the expression feature (blue box), the information about the samples (green box), and the measured expression levels (pink box). The sample information can be enriched from other sources including predictions.",
"One of the main features of the recount2 project is that it only takes a couple lines of code to download and start using the expression counts from any given project. This greatly improves the reproducibility of RNA sequencing projects and makes it easy for researchers to re use public data.",
"We hope that the recount2 project will be as useful as the original ReCount project for promoting the development of new biostatistical methods.",
"The fact that the data was processed uniformly for the recount2 project makes it easy to combine data from multiple studies and perform meta analyses.",
"This plot shows an example of the concordance when comparing two tissues using two studies (orange line) versus two incorrect scenarios.",
"The recount2 project was made possible thanks to a diversified team and multiple funding sources. Thank you for learning about our project and we hope that you will find it useful for your RNA sequencing projects!",
"If you are interested in further improving this project, please contact any of the primary investigators of the project."
)
## Find the slides in the correct order
ari_files <- c(
dir('BioinfoPrize2018', recursive = TRUE, full.names = TRUE,
pattern = 'Slide0'),
dir('BioinfoPrize2018', recursive = TRUE, full.names = TRUE,
pattern = 'Slide1')
)
## Check that they match in length
stopifnot(identical(length(ari_files), length(ari_text)))
## Create the video
ari_spin(
ari_files,
ari_text,
voice = "Joey"
)
## Create the video with another voice
ari_spin(
ari_files,
ari_text,
voice = "Joanna",
output = "output_joanna.mp4"
)
The website, video and project were made possible thanks to:
[1] J. Allaire, Y. Xie, J. McPherson, J. Luraschi, et al. rmarkdown: Dynamic Documents for R. R package version 1.8. 2017. URL: https://CRAN.R-project.org/package=rmarkdown.
[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.
[1] S. Kross. ari: Automated R Instructor. R package version 0.1.0. 2017. URL: https://CRAN.R-project.org/package=ari.
[1] A. Oleś, M. Morgan and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.6.1. 2017. URL: https://github.com/Bioconductor/BiocStyle.
[1] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2017. URL: https://www.R-project.org/.
[1] H. Wickham and W. Chang. devtools: Tools to Make Developing R Packages Easier. R package version 1.13.4. 2017. URL: https://CRAN.R-project.org/package=devtools.
## Reproducibility info
proc.time()
## user system elapsed
## 21.890 2.009 30.416
message(Sys.time())
## 2018-01-31 10:49:35
library('devtools')
options(width = 120)
session_info()
## Session info ----------------------------------------------------------------------------------------------------------
## setting value
## version R version 3.4.3 (2017-11-30)
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/New_York
## date 2018-01-31
## Packages --------------------------------------------------------------------------------------------------------------
## package * version date source
## ari * 0.1.0 2017-08-31 CRAN (R 3.4.1)
## assertthat 0.2.0 2017-04-11 CRAN (R 3.4.0)
## aws.polly 0.1.2 2016-12-08 CRAN (R 3.4.0)
## aws.signature 0.3.5 2017-07-01 cran (@0.3.5)
## backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
## base * 3.4.3 2017-12-07 local
## base64enc 0.1-3 2015-07-28 CRAN (R 3.4.0)
## bibtex 0.4.2 2017-06-30 CRAN (R 3.4.1)
## BiocStyle * 2.6.1 2017-11-30 Bioconductor
## colorout * 1.1-3 2018-01-09 Github (jalvesaq/colorout@31d7db0)
## compiler 3.4.3 2017-12-07 local
## curl 3.1 2017-12-12 CRAN (R 3.4.3)
## datasets * 3.4.3 2017-12-07 local
## devtools * 1.13.4 2017-11-09 CRAN (R 3.4.2)
## digest 0.6.14 2018-01-14 CRAN (R 3.4.3)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
## graphics * 3.4.3 2017-12-07 local
## grDevices * 3.4.3 2017-12-07 local
## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
## httr 1.3.1 2017-08-20 CRAN (R 3.4.1)
## jsonlite 1.5 2017-06-01 CRAN (R 3.4.0)
## knitcitations * 1.0.8 2017-07-04 CRAN (R 3.4.1)
## knitr 1.18 2017-12-27 CRAN (R 3.4.3)
## lubridate 1.7.1 2017-11-03 CRAN (R 3.4.2)
## magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
## MASS 7.3-48 2017-12-25 CRAN (R 3.4.3)
## memoise 1.1.0 2017-04-21 CRAN (R 3.4.0)
## methods 3.4.3 2017-12-07 local
## plyr 1.8.4 2016-06-08 CRAN (R 3.4.0)
## prettyunits 1.0.2 2015-07-13 CRAN (R 3.4.0)
## progress 1.1.2 2016-12-14 CRAN (R 3.4.0)
## purrr 0.2.4 2017-10-18 CRAN (R 3.4.2)
## R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
## Rcpp 0.12.15 2018-01-20 CRAN (R 3.4.3)
## RefManageR 0.14.20 2017-08-17 CRAN (R 3.4.1)
## rlang 0.1.6 2017-12-21 CRAN (R 3.4.3)
## rmarkdown 1.8 2017-11-17 CRAN (R 3.4.2)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3)
## rvest 0.3.2 2016-06-17 CRAN (R 3.4.0)
## signal 0.7-6 2015-07-30 cran (@0.7-6)
## stats * 3.4.3 2017-12-07 local
## stringi 1.1.6 2017-11-17 CRAN (R 3.4.2)
## stringr 1.2.0 2017-02-18 CRAN (R 3.4.0)
## tools 3.4.3 2017-12-07 local
## tuneR 1.3.2 2017-04-10 cran (@1.3.2)
## utils * 3.4.3 2017-12-07 local
## webshot 0.5.0 2017-11-29 cran (@0.5.0)
## withr 2.1.1 2017-12-19 CRAN (R 3.4.3)
## xml2 1.2.0 2018-01-24 CRAN (R 3.4.3)
## yaml 2.1.16 2017-12-12 CRAN (R 3.4.3)