Sharing my work for "Advanced Methods III"

This semester I’m taking the live version of the Data Analysis class by Jeff Leek. His more popular version of the course is available through Coursera

One of the things that Jeff promotes is reproducibility and sharing code. I share that tendency and thus created a Git repository for my homework and code for the class: lcollado753. I’m hosting it with GitHub to try it out since I started with Mercurial via Bitbucket. 

Part of me would love it if everyone in the class had their own Git repositories. I mean, this class involves lots of practice exercises and there are plenty of R packages and functions that others use that I would like to learn. As I don’t see this happening, I think that it would be great to list the packages/functions you think could be interesting to others at the end of the write-ups. However, this involves sharing the reports and I don’t know if that will happen.

But maybe I didn’t get the instructions Jeff gave correctly the first time. Listening into his week 2 talks from the Coursera course, I get that he wants our reports to be reproducible. The idea is great, but sometimes I get lots in the technicalities of finding the best fit for our situation. Aka, something we can all do that is worth the time for small scale projects that we have a couple of days to complete and most likely will be finishing the day before they are due. For now we might stick to sharing zip files with the report + summarized data set (it has be small enough to be sharable by email).

I’m pretty happy with hosting my stuff at GitHub. One blunder I made in the first data analysis report is that I completely forgot to say in it that I have the code in GitHub :P Oh well, next time!

I feel that I also have lots to improve regarding how to tell a story in a report. Plus, for this first project I mainly did some exploratory data analysis without much stat analysis.

Overall, I’m quite excited with this course =) and I think that I’ll learn a ton on methods to analyze data AND how to actually implement them. Plus, I’m currently trying to learn ggplot2 as you can see in that first report. Also, I made it with knitr instead of Sweave =)

Leonardo Collado-Torres
Leonardo Collado-Torres
Investigator @ LIBD, Assistant Professor, Department of Biostatistics @ JHBSPH

#rstats @Bioconductor/🧠 genomics @LieberInstitute/@lcgunam @jhubiostat @jtleek @andrewejaffe alumni/@LIBDrstats @CDSBMexico co-founder

comments powered by Disqus