April 2nd, 2015
This is a short introduction on how to use BiocParallel
(Morgan, Lang, and Thompson, 2015) and knitrBootstrap
(Hester, 2014).
You will need R 3.1.2 or newer (available from CRAN) and BiocParallel
## Install BiocParallel source('http://bioconductor.org/biocLite.R') biocLite('BiocParallel')
You will also need knitrBootstrap
## If needed: # install.packages('devtools') devtools::install_github('jimhester/knitrBootstrap')
You can find the latest documentation at
help(package = 'BiocParallel') help(package = 'knitrBootstrap')
plot(y = 10 / (1:10), 1:10, xlab = 'Number of cores', ylab = 'Time', main = 'Ideal scenario', type = 'o', col = 'blue', cex = 2, cex.axis = 2, cex.lab = 1.5, cex.main = 2, pch = 16)
plot(y = 10 / (1:10), 1:10, xlab = 'Number of cores', ylab = 'Time', main = 'Reality', type = 'o', col = 'blue', cex = 2, cex.axis = 2, cex.lab = 1.5, cex.main = 2, pch = 16) lines(y = 10 / (1:10) * c(1, 1.05^(2:10) ), 1:10, col = 'red', type = 'o', cex = 2)
CRAN Task View: High-Performance and Parallel Computing with R
birthday <- function(n) { m <- 10000 x <- numeric(m) for(i in 1:m) { b <- sample(1:365, n, replace = TRUE) x[i] <- ifelse(length(unique(b)) == n, 0, 1) } mean(x) }
system.time( lapply(1:100, birthday) )
## user system elapsed ## 24.119 0.251 24.388
Source slide 24
library('doMC')
## Loading required package: foreach ## foreach: simple, scalable parallel programming from Revolution Analytics ## Use Revolution R for scalability, fault tolerance and more. ## http://www.revolutionanalytics.com ## Loading required package: iterators ## Loading required package: parallel
registerDoMC(2) system.time( x <- foreach(j = 1:100) %dopar% birthday(j) )
## user system elapsed ## 19.263 0.315 22.145
library('BiocParallel') system.time( y <- bplapply(1:100, birthday) )
## user system elapsed ## 0.238 0.075 12.789
bp
lapply()registered()
## $MulticoreParam ## class: MulticoreParam ## bpworkers: 4 ## bpisup: FALSE ## bplog: FALSE ## bpthreshold: ## bplogdir: ## bpresultdir: ## bpstopOnError: FALSE ## cluster type: FORK ## ## $SnowParam ## class: SnowParam ## bpworkers: 4 ## bpisup: FALSE ## bplog: FALSE ## bpthreshold: INFO ## bplogdir: ## bpresultdir: ## bpstopOnError: FALSE ## cluster type: SOCK ## ## $SerialParam ## class: SerialParam ## bpworkers: 1
## Test in serial mode system.time( y.serial <- bplapply(1:10, birthday, BPPARAM = SerialParam()) )
## user system elapsed ## 2.103 0.017 2.124
## Try Snow system.time( y.snow <- bplapply(1:10, birthday, BPPARAM = SnowParam(workers = 2)) )
## user system elapsed ## 0.026 0.019 1.738
$ R > library('BiocParallel') > registered() $MulticoreParam class: MulticoreParam; bpisup: TRUE; bpworkers: 8; catch.errors: TRUE setSeed: TRUE; recursive: TRUE; cleanup: TRUE; cleanupSignal: 15; verbose: FALSE $SnowParam class: SnowParam; bpisup: FALSE; bpworkers: 8; catch.errors: TRUE cluster spec: 8; type: PSOCK $BatchJobsParam class: BatchJobsParam; bpisup: TRUE; bpworkers: NA; catch.errors: TRUE cleanup: TRUE; stop.on.error: FALSE; progressbar: TRUE $SerialParam class: SerialParam; bpisup: TRUE; bpworkers: 1; catch.errors: TRUE
cluster.functions = makeClusterFunctionsSGE("~/simple.tmpl") mail.start = "none" mail.done = "none" mail.error = "none" staged.queries = TRUE fs.timeout = 10
Via Prasad Patil
#!/bin/bash # Job name #$ -N <%= job.name %> # Use current directory #$ -cwd # Get emails #$ -m e R CMD BATCH --no-save --no-restore "<%= rscript %>" /dev/stdout exit 0
Modified from Prasad's version.
I like emails to then explore stats using Alyssa's efficency SGE analytics: code.
#!/bin/bash # The name of the job, can be anything, simply used when displaying the list of running jobs #$ -N <%= job.name %> # Combining output/error messages into one file #$ -j y # Giving the name of the output log file #$ -o <%= log.file %> # One needs to tell the queue system to use the current directory as the working directory # Or else the script may fail as it will execute in your top level home directory /home/username #$ -cwd # use environment variables #$ -V # use correct queue #$ -q <%= resources$queue %> # use job arrays #$ -t 1-<%= arrayjobs %> # we merge R output with stdout from SGE, which gets then logged via -o option R CMD BATCH --no-save --no-restore "<%= rscript %>" /dev/stdout exit 0
library('BiocParallel') library('BatchJobs') # define birthday() function ## Register cluster funs <- makeClusterFunctionsSGE("~/simple.tmpl") param <- BatchJobsParam(workers = 10, resources = list(ncpus = 1), cluster.functions = funs) register(param) ## Run system.time( xx <- bplapply(1:100, birthday) ) ## Jobs spend a little bit of time in the queue # user system elapsed # 0.597 0.350 31.644
For developers:
Developers wishing to invoke back-ends other than MulticoreParam need to take special care to ensure that required packages, data, and functions are available and loaded on the remote nodes.
Source: BiocParallel vignette
BiocParallel
?People like them because
rmarkdown
(Allaire, McPherson, Xie, Wickham, et al., 2014) has simplified the processCreating awesome reports for multiple audiences using knitrBootstrap
--- output: knitrBootstrap::bootstrap_document: theme.chooser: TRUE highlight.chooser: TRUE --- Title ==== Etc
Then render:
rmarkdown::render('myFile.Rmd')
bootstrap.show.code = FALSE bootstrap.show.warning = FALSE bootstrap.show.message = FALSE
BiocParallel
tag!knitrBootstrap
, open a new issue in the GitHub repo## Citation info citation('BiocParallel')
## Warning in citation("BiocParallel"): no date field in DESCRIPTION file of ## package 'BiocParallel'
## ## To cite package 'BiocParallel' in publications use: ## ## Martin Morgan, Michel Lang and Ryan Thompson (). BiocParallel: ## Bioconductor facilities for parallel evaluation. R package ## version 1.1.21. ## ## A BibTeX entry for LaTeX users is ## ## @Manual{, ## title = {BiocParallel: Bioconductor facilities for parallel evaluation}, ## author = {Martin Morgan and Michel Lang and Ryan Thompson}, ## note = {R package version 1.1.21}, ## }
citation('knitrBootstrap')
## ## To cite package 'knitrBootstrap' in publications use: ## ## Jim Hester (2014). knitrBootstrap: Knitr Bootstrap framework.. R ## package version 1.0.0. https://github.com/jimhester/ ## ## A BibTeX entry for LaTeX users is ## ## @Manual{, ## title = {knitrBootstrap: Knitr Bootstrap framework.}, ## author = {Jim Hester}, ## year = {2014}, ## note = {R package version 1.0.0}, ## url = {https://github.com/jimhester/}, ## } ## ## ATTENTION: This citation information has been auto-generated from ## the package DESCRIPTION file and may need manual editing, see ## 'help("citation")'.
Code for creating this page
## Create this page library('rmarkdown') render('index.Rmd') ## Clean up file.remove('BiocParallel-knitrBootstrap.bib') ## Extract the R code library('knitr') knit('index.Rmd', tangle = TRUE)
Date this tutorial was generated.
## [1] "2015-04-02 01:31:11 EDT"
Wallclock time spent running this tutorial.
## Time difference of 1.095 mins
R
session information.
## setting value ## version R Under development (unstable) (2014-11-01 r66923) ## system x86_64, darwin10.8.0 ## ui AQUA ## language (EN) ## collate en_US.UTF-8 ## tz America/New_York
## package * version date source ## bibtex 0.4.0 2014-12-31 CRAN (R 3.2.0) ## BiocParallel * 1.1.21 2015-03-24 Bioconductor ## bitops 1.0.6 2013-08-17 CRAN (R 3.2.0) ## codetools 0.2.11 2015-03-10 CRAN (R 3.2.0) ## devtools * 1.6.1 2014-10-07 CRAN (R 3.2.0) ## digest 0.6.8 2014-12-31 CRAN (R 3.2.0) ## doMC * 1.3.3 2014-02-28 CRAN (R 3.2.0) ## evaluate 0.5.5 2014-04-29 CRAN (R 3.2.0) ## foreach * 1.4.2 2014-04-11 CRAN (R 3.2.0) ## formatR 1.0 2014-08-25 CRAN (R 3.2.0) ## futile.logger 1.4 2015-03-21 CRAN (R 3.2.0) ## futile.options 1.0.0 2010-04-06 CRAN (R 3.2.0) ## htmltools 0.2.6 2014-09-08 CRAN (R 3.2.0) ## httr 0.5 2014-09-02 CRAN (R 3.2.0) ## iterators * 1.0.7 2014-04-11 CRAN (R 3.2.0) ## knitcitations * 1.0.4 2014-11-03 Github (cboettig/knitcitations@508de74) ## knitr 1.7 2014-10-13 CRAN (R 3.2.0) ## lambda.r 1.1.7 2015-03-20 CRAN (R 3.2.0) ## lubridate 1.3.3 2013-12-31 CRAN (R 3.2.0) ## memoise 0.2.1 2014-04-22 CRAN (R 3.2.0) ## plyr 1.8.1 2014-02-26 CRAN (R 3.2.0) ## Rcpp 0.11.5 2015-03-06 CRAN (R 3.2.0) ## RCurl 1.95.4.5 2014-12-28 CRAN (R 3.2.0) ## RefManageR 0.8.40 2014-10-29 CRAN (R 3.2.0) ## RJSONIO 1.3.0 2014-07-28 CRAN (R 3.2.0) ## rmarkdown * 0.3.3 2014-09-17 CRAN (R 3.2.0) ## rstudioapi 0.2 2014-12-31 CRAN (R 3.2.0) ## snow 0.3.13 2013-09-27 CRAN (R 3.2.0) ## stringr 0.6.2 2012-12-06 CRAN (R 3.2.0) ## XML 3.98.1.1 2013-06-20 CRAN (R 3.2.0) ## yaml 2.1.13 2014-06-12 CRAN (R 3.2.0)
This tutorial was generated using rmarkdown
(Allaire, McPherson, Xie, Wickham, et al., 2014) and knitcitations
(Boettiger, 2015).
[1] J. Allaire, J. McPherson, Y. Xie, H. Wickham, et al. rmarkdown: Dynamic Documents for R. R package version 0.3.3. 2014. URL: http://CRAN.R-project.org/package=rmarkdown.
[2] C. Boettiger. knitcitations: Citations for knitr markdown files. R package version 1.0.4. 2015. URL: https://github.com/cboettig/knitcitations.
[3] J. Hester. knitrBootstrap: Knitr Bootstrap framework. R package version 1.0.0. 2014. URL: https://github.com/jimhester/.
[4] M. Morgan, M. Lang and R. Thompson. BiocParallel: Bioconductor facilities for parallel evaluation. R package version 1.1.21. 2015.