Setting up your computer for bioinformatics/biostatistics and a compedium of resources

Jumping on the train set by Hilary ParkerThe Setup (Part 1)" and Alyssa Frazeemy software/hardware setup”, I’m going to share my setup and hopefully add something new. They both did a great job already, so make sure you read their posts!

I have some experience with all three main OS: Windows, Linux and Mac. That being said, I know some of the basic stuff for each but I surely use Google very frequently to get help. I used to have a dual Windows / Linux (Ubuntu) set up but now I have a Windows laptop/desktop (it’s a monster :P) at home and I’m happy working with my Mac. 

I’m going to start by mentioning the software I use(d) in each OS and then add some other tools that I really like.

Windows

  • Text editor: Notepad++. It outperforms Notepad by light years! A must for me is the “View -> Word wrap” option. I would definitely go to “Settings -> Preferences -> New Document/Default Directory” and change the new document format from Windows (Dos) to Unix. This will save you time later when you want to work on a Unix system like the cluster. If you didn’t, you can change a specific document’s EOL (end of line) by using “Edit -> EOL conversion -> UNIX format”. Another feature that I like is the “Search -> Replace…” which allows you to use regular expressions (like Perl). 
  • Statistical software: R of course! It’s best to do a custom installation and choose a directory without spaces in it. That will help later (further below). If you want to be convinced to join the R community read Data Analysits Captivated by R’s power and R You Ready for R? or just take a look at what others are using. Remember that the R project is open source, free and easy to contribute to. If you end up choosing Excel as your statistical software, well, there is no hope for you!!
  • R code editor: Notepad++ with NppToR. For a long time I used Emacs modified to work with Windows by Vincent Goulet available here. It works great and saves you quite a bit of setup time. XEmacs is another option that a friend of mine used, but it never convinced me. Anyhow, I ended up changing from Emacs to Notepad++ with NppToR because I could:
  1. Force quit R (and not lose code changes) in case I crashed R by doing something stupid like printing something huge or w/e :P 
  2. Access help pages in a separate window. I’m sure you can do it too with Emacs, but I was just lazy to configure it.
  3. Shorter shortcuts
  4. Later on I found the NppToR to PuTTy feature which is very useful.
  5. You can create an R syntax dictionary (or something like that) in Notepad++ which will scan all your R packages and add the function names so they are colored when you type them. Also Notepad++ will auto-complete some function names and show you the arguments. Great stuff! (Forgot the name, so… google it :P)
  • SSH: PuTTY. As said before, works well with Notepad++ and NppToR.
  • SCP: WinSCP. There are others that work too like Filezilla but well, WinSCP does the job well.
  • PDF viewer: Adobe Acrobat Professional. I’m using the X version now. I like how I can highlight, underline, cross out, free hand, sticky note, combine files into a single pdf, combine pdfs, and change the highlight colors easily. It also has a change tracker (kind of like Word has). I’ve seen other use PDF Annotator which is available for free for Hopkins students. Anyhow, I simply love Acrobat for reading papers.
  • LaTeX: MiKTeX. For writing TeX files I used either Emacs or Notepad++. There is another software which has drop down menus and the like called WinEdt. I got used to typing LaTeX from scratch, well, I have a template.Rnw somewhere. Oh yeah, I always use Sweave when writing TeX files (even if I don’t use R). 
  • R reports: Sweave by Friedrich Leisch, one of the champions of reproducibility! To learn more about Sweave first read this pdf by Nicola Sartori. This is another Sweave demo by Charles J Geyer. Check out this great Windows Sweave troubleshooting page by John D Cook.
  • Building R packages from source. You will definitely need Rtools installed. I would also install QPDF which can be used by R to compress your pdf files, which is a good thing if you want to have a small-sized tarball. Last but not least, check out Building R packages for Windows by Rob J Hyndman.
  • Learn to modify your PATH! Check 1.3 from the previous link by Rob J Hyndman. If you are going to use Sweave, it’s best to add to your PATH the path for the directory containing your Sweave.sty file so that you won’t need to copy it to every single directory. This is why it pays off to do an R custom installation and put it in C:/R/R-current-version or something like that instead of C:/Program Files/ bla bla with spaces. It used to be more important a few years ago. Also, I created a sw.bat file and put it somewhere where my PATH would find it. That sw.bat file ran Sweave, pdflatex twice, then bibtex and finally opened the pdf file.
  • PDF viewer for LaTeX files. I only learnt about pdf sync and the like a year ago. You should google how to set this up with SumatraPDF (Adobe Acrobat doesn’t work!).
  • Version control: Mercurial. It’s very easy to use and you can get an account at Bitbucket.org with unlimited number of private repositories if you have an academic email. Even if you are not doing a collaborative project, you will love using a version control system! It will clean up your directories very nicely and will help you become more organized. Learning a few commands is nothing compared to having lots of files with _v1 v_2, etc at the end. Check out the Mercurial guide to get started. Note that for windows instead of customizing your .hgrc file you will customize a mercurial.ini file.
  • Presentations: both PowerPoint and Beamer (normally with Sweave too). Rarely I use Google Docs for this.
  • Office: Either Microsoft Office or OpenOffice (free).
  • Poster creator: PosterGenius (academic discount price). It was very easy to use and I would surely give the free trial version a go. It adds a watermark, but well, you will appreciate the time you save compared to using PowerPoint. I guess that Adobe Photoshop is another option, but I’ve only used it to edit photos here and there, not to make a whole poster. The most I did was create this.
  • To de-compress RAR files: WinRAR.
  • Anti-spyware: Avast (free version). I normally keep it in silent (gaming) mode so it doesn’t show pop ups.
  • CCCP codec pack which includes the Media Player Classic. Great for watching video files and dumping the crappy Windows Media Player.

Linux (Ubuntu)

Ubuntu provides Linux distributions that are very user friendly and that look much like Mac OS does now. You’ll find it easy to run multi-core programs which were a pain to do with Windows. Beware that even if you use Ubuntu you will need to learn stuff like how to compile. Also, please check before you install that your computer is supported. For example, some laptops with very new video cards might not work properly. That being said, with Ubuntu you will feel very at ease working in an area like mine (genomics) because a lot of the software runs in Linux (normally in a cluster, but you can test in your lap).

You will want to check and/or keep for reference LINUX Essentials by Thomas Girke (more from him below), Learn Linux in 10 minutes, The Linux tutorial, and Linux vi editor tutorial.  

* If you are going to use Linux (Ubuntu) at some point you will want to compile something from source and find out that you are missing a dependency. That’s when I google, then use : 1. apt-cache search something 2. sudo apt-get install something 
  • Note that you will frequently need the yyyy-devel version which includes c headers and stuff that you need to compile.
  • You will find a lot of things through the package installer (forgot what it’s called). Learn the pseudonym for your Ubuntu version so you select the appropriate version of the software in case that you are downloading it from another place.
  • SSH/SCP: terminal commands :) I just wanted to mention that rsync is a nice command for synching folders (recursively too) between your computer and say the cluster.
  • Version control: Mercurial again. The configuration file is .hgrc not mercurial.ini
  • Text editor: Nedit or Emacs. Vi when doing in-terminal modifications.
  • You can get R through aptitude or if you want the very latest (or a devel version) you’ll have to compile it. The first time you will have to install plenty of dependencies, but it’s good practice.
  • Office: go with OpenOffice.
  • LaTeX: install the texlive distribution. I normally get everything so that later when I’m trying to use a TeX package I won’t have to go install it (which is what MiKTeX does for you in Windows). 
  • Video player: VLC.

Mac

  • Terminal: iTerm2. Mac comes with a native terminal, but iTerm2 has other nice functions like tabs and more options to customize it. Check this for free color palettes. I like the Homebrew one from here.
  • LaTeX and R editor: Aquamacs is a version of Emacs that works great. However, as I discussed in the Windows section I’m moving away from Emacs. Well, to be honest, I don’t want to put the time to learn how to customize Emacs properly and do amazing stuff with it like Kasper does. Recently (since June) I’ve been using TextMate. It has this thing called “Bundles” which provides different hotkeys depending on the file you are editing. Meaning that for Rnw files you can Sweave them directly there and for R files you can either send the code to R or to the terminal (much like Notepad++). The one thing is that it is not free BUT there is a 2.0 alpha release available on github that you can compile. This lengthly discussion can be worth reading if you want to know more about the 2.0 version and the future of TextMate. Someone said there that Sublime might be replacing TextMate but I haven’t looked for any R integration in it. Anyhow, I liked how TextMate included an auto-spell checker that recognizes Sweave/LaTeX code from the box :)
  • Package installer: MacPorts. It’s kind of similar to aptitude from Linux but it’s Mac only. Note that you will definitely need to get XCode.
  • PDF viewer: also Adobe Acrobat Pro for the reasons mentioned previously.
  • PDF viewer for LaTeX: TeXShop which I think comes with the MacTeX distribution. It has the forward sync that Alyssa mentions in her blog.
  • Text editor: TextWrangler. Has several of the functions I talked about in the Notepad++ section like search and replace with regular expressions. It definitely outperforms the native text editor.
  • SCP: Cyberduck. I haven’t tried others, but it works and I’m happy with it. I also use the terminal to push/retrieve files like I would do in Linux. Same for ssh and Mercurial.
  • Version control: Mercurial. Note that you might have to add a site key (like bitbucket’s) to your .hgrc file so it doesn’t complain when pushing files.
  • Productivity: My Little Pomodoro available from the Mac app store. I love it for following the Pomodoro technique (you can use any other timer that you like) which Hilary introduced me to. It works like a charm when you are under stress and need to be productive. After all, I’m prone to escape the stress and distract myself, so this helps me keep my distractions limited. I’ve also found that when I’m stuck in a problem and I take the 5 min break thinking about something else, well, the machine keeps working and when I come back from the break I have a new idea to try out.
  • Video player: VLC (includes codecs).

Other stuff

  • Browser: I used to loveMozilla Firefox (it has a nice sync functionality) but I’ve moved to Google Chrome. It’s kind of a shame that Google started to compete with Mozilla, but oh well bye bye 2007. I use Chrome because it works a tad bit better with other Google tools, but that’s it. It also syncs your bookmarks. Both work great and Opera is still my favorite backup browser. I guess anything but Internet Explorer and Safari.
  • Learning R. I would definitely check Thomas Girke Programming in R page and Frank McCown’s Producing Simple Graphs with R tutorial. I have my own share of R related slides here.
  • Learning Bioconductor. Thomas Girke again wrote a great resource for learning how to use Bioconductor for analyzing high-throughput sequencing data files. Bioconductor hosts packages for other technologies/problems, so I would also look at it’s own help pages like the Workflows section. I also like Peter’s R programming pages, specially the heatmap section. I have my own share of Bioconductor related slides here.
  • Learning LaTeX. I learnt the hard way I guess… I learnt by comparing Sweave files and their output and seeing what changed if I modified the code. Nowadays, I would very highly recommend that you first check the How to Use LaTeX short series of exercises/files by Andrew Matchett. The Not so short introduction to LaTeX is a great resource. For very specific symbols, check the Comprehensive LaTeX symbol list.
  • Using a SGE (Sun Grid Engine) cluster. For some basic commands look here. For the Hopkins cluster, definitely read this. Finally, for running array jobs check this.
  • LaTeX and math. You should definitely read the wiki books page for this topic. I kept going back to it over my first year at Hopkins when I really needed to learn all this. The theorems page is nice, but not a must. Same for Common TeX/LaTeX errors.
  • Figures in LaTeX. Here is a basic overview but the wiki books page for the topic is a must check.
  • Accents in LaTeX. Check this blog post by “Bugs and Solutions”.
  • Blogging R code: Pretty-R. I haven’t really used it but it surely looks pretty!!
  • Cloud storage: Dropbox works great and tons of iPad apps have an option to backup to it which works great with my note-taking apps. Google Drive and others are also around.
  • Paper (biobliography) organizer: Zotero is amazing! I simply love it :) I pull the bibliography from pubmed or the magazine page itself and to avoid any hassle, I have a “papers” folder in my Dropbox where I only organize them by last name. Then if I want to find something, I go to Zotero and look use it’s great search function. I rarely use it to annotate webpages and I hear that it can now upload files to the cloud. Anyhow, I first used it in my Windows/Ubuntu dual setup. You can use it as a Zotero Firefox plugin or as Zotero stand-alone with Zotero Connector (Google Chrome for example). Finally, Zotero can export your bibliography into a BibTex file :)
  • Blog: Tumblr. Some like WordPress better, but I like how Tumblr is not only a blogging platform but also a social media tool. I’ve written several posts before on how to customize your blog and other blog related tools. But a must in my point of view is to get your RSS feed “burnt” with feedburner. It has lots of interesting tools and is much better than a plain XML RSS feed.
  • Notes: Notability iPad app. Great stuff and doesn’t blow up on you (aka, doesn’t lose your notes) like NotesPlus did to me. Anyhow, note-taking in my iPad with auto-cloud backups greatly changed my classroom experience. THere are other apps like this one out there.
  • I use Voice Recorder HD iPad app for recording lectures. It’s useful when you miss something the professor went over quickly and you are trying to understand it later on.
  • Email: Gmail with keyboard shortcuts enabled. I also use “Canned Responses” from the Google Labs to specify my signature. I have an academic one, one for Mexico, etc.
  • Send emails later: Boomerang for Gmail.
  • Calendar: Google Calendar.
  • Task manager: Google Tasks from within Google Calendar (not from Gmail, which is doable too) with GoTasks in my iPhone.
  • RSS reader: Google Reader. Works great. By the way, Orbvious interest (below) also works with Google Reader and has a customizable hotkey for it.
  • Mark pages to read later: Orbvious Interest for Google Chrome. Great stuff! It syncs between computers and you can use Pocket in your iPad to view the links. 
  • Maps: Google Maps.
  • Video conversation: Skype, Google Hangouts. If I’m going to help someone remotely, then I use TeamViewer which is free for non-profit purposes. With it you can move their mouse, which makes things much easier for support issues!
  • Photos: Picasa. I pay the 5 bucks a year for 20 GB on Picasa Web Albums so all my photos are on the cloud.
  • Dictionary: die.net
  • Network visualizer/analyzer: Cytoscape.
  • Venn diagrams with more than 2 sets.
  • Setting up your website. I pretty much followed Alyssa’s instructions and got my CSS template for my academic page from FCT. Then I used simple html to modify it.

I pretty much dumped a ton of my bookmarks in this huuuuge post! Well, I hope that it will be useful to someone. At least now I’m happy to have contributed to Hilary’s computing-resources-post drive.

Leonardo Collado-Torres
Leonardo Collado-Torres
Investigator @ LIBD, Assistant Professor, Department of Biostatistics @ JHBSPH

#rstats @Bioconductor/🧠 genomics @LieberInstitute/@lcgunam @jhubiostat @jtleek @andrewejaffe alumni/@LIBDrstats @CDSBMexico co-founder

comments powered by Disqus