P-values and Statistics phylosophy

I’m in the process of catching up with all the posts from SimplyStatistics that I didn’t read during the break. Doing so I found a very interesting post on p-values (more below)

simplystatistics:

>

This post written by Jeff Leek and Rafa Irizarry.

>

The p-value is the most widely-known statistic. P-values are reported in a large majority of scientific publications that measure and report data. R.A. Fisher is widely credited with inventing the p-value. If he was cited every time a p-value was reported his paper would have, at the very least, 3 million citations* - making it the most highly cited paper of all time. 

>

Read More

It’s like the p-value is a necessary evil. The post is indeed a great read, but what caught my attention was the following statement:

>

The advent of new measurement technology has shifted much of science from hypothesis driven to discovery driven making the existing multiple testing machinery useful.

I’m taking a class with Chuck Rohde on probability with a strong philosophical component. That lead me to ask him recently how he would try to convince a biologist that he should believe/trust statistics. To explain myself a bit more, I asked the question as I’ve been in environments where people follow their intuition on what is significant or not and disregard whatever statistics says on the same data. So in the past I’ve been in discussion where I’m the one trying to convince others that statistics say something is not significantly different. 

I don’t think that I communicated my question correctly as Rohde’s answer is that whenever he says a +- on count data we (as statisticians) are not doing something right! He also reflected on the past as at some time people were quite into mathematical biology, a field that has either died out or transformed into new fields (systems biology comes to mind). Will the same happen to genomics? *Sound of a coin flipping in the air* I don’t have a clue!

In a sense, the post at SimplyStatistics approaches the same issue I was trying to ask Rohde:

>

Why not explain to our collaborator that the observation they thought was so convincing can easily happen by chance in a setting that is uninteresting. […] In general, we find p-value to be superior to our collaborators intuition of what patterns are statistically interesting and which ones are not.

To end the post, I invite you to read the paper A Brief History of the Hypothesis by David Glass and Ned Hall from back in 2008. It’s one of my all time favorites. I’ve read it a few times and while my memory distorts what they really wrote, in my mind it’s a great summary with a message: it’s time to abandon hypothesis-driven science and continue with question-driven science involving building models from data and not from our hypothesis. 

Here are a few quotes from their paper:

>

We propose that building hypotheses should be abandoned in favor of posing a straightforward question of a system and then receiving an answer, using that answer to model reality, and then testing the reproducibility and predictive power of the model, modifying it as necessary.

They end their paper with:

>

Thus, although a hypothesis might have been thought to be necessary in the past, it no longer seems to be so. It is better to see science as a quest for good questions to try to answer, rather than a quest for bold hypotheses to try to refute.

Do you agree?

Leonardo Collado-Torres
Leonardo Collado-Torres
Investigator @ LIBD, Assistant Professor, Department of Biostatistics @ JHBSPH

#rstats @Bioconductor/🧠 genomics @LieberInstitute/@lcgunam @jhubiostat @jtleek @andrewejaffe alumni/@LIBDrstats @CDSBMexico co-founder

comments powered by Disqus