Sharing while caring

Today, let’s throwback to the multiple comparisons problem and relate it to something new: Open Science.

In the past years, more and more researchers, legislators and politicians have started to campaign for an open and transparent scientific conduct. The fact that the majority of scientific articles are locked behind the walls of expensive magazines and thus unreachable for the general tax payer (even though they funded the research) is upsetting. Moreover, the scientific community is struggling with reproducibility – the center for open science reproduced 100 psychology studies and found that only 39% of the effects were rated to have replicated the result of the original study! Sharing raw data and code, and publishing in open access journals can hopefully solve these problems.

Gold Diggers in Australia (Edwin Stocqueler, 1855)

Gold Diggers in Australia (Edwin Stocqueler, 1855)

Just a month ago the Amsterdam Call for Open Science was released, urging the EU to fundamentally change its science policy and encourage Open Science. Here in the United States, scientific journals are changing their data availability policies and new organizations such as the Center for Open Science, the Open Science Project and the Open Science Federation have emerged.

Open Science seems to be happening and growing and that is great!

However, a while ago I attended a talk by William Revelle, a psychology professor who devotes most of his time to developing and encouraging open science methods and data sharing (see for example the Personality project). At the end of his talk someone asked a question that I had never thought about and that I don’t think the field of Open Science has yet addressed ‘How will Open Science deal with the problem of multiple comparisons?’.

Let’s take a step back and discuss why the problem of multiple comparisons is a potential problem for Open Science. As I explained in my previous post, it is almost always possible to get a significant result, as long as you do lots and lots of tests. Even a dead salmon will show brain activity, if you look long for it enough. Researchers therefore adjust their criteria for significance depending on how many tests they do – the more test, the smaller the p-value needs to be. This works great for single researchers who know exactly which tests they have performed on their data. However, what happens when everyone has access to a data set?

One example of a popular Open Access data set, are the grid cell recordings from the Moser lab. On their website you will find the following message:

‘[…] Our intention is to make all raw data from all published studies available. The data contain a lot more interesting information than what has been published and we encourage users to dig further. Please do not hesitate to contact us for updates and information’

I think that almost all of us who have collected large data sets have melancholically realized that there is so much more information in it than we have time to study, and I thus think it is a great idea to publish raw data online and encourage users to dig further. However, if 1000 people start digging, all testing their own slightly different hypothesis, and one person finds gold: a significant result, how should we treat this result? What significance level should we apply?

Unfortunately I forgot William Revelle’s answer to this question, so I will instead give you some of my own thoughts on this problem. I think that one possible solution could be to keep track of how many people download the data and adjust the necessary p-value according to that. However, it is unclear how many people actually use and test the data set after they download it. Another idea could be to accompany every open data set by a forum on which scientists can write down what tests they have done and what results they have found. This method could work if researchers feel a strong affinity with proper use of Open Science data. However, it is unclear how many scientists feel this responsibility enough to spend their time writing about failed attempts.

What I think is clear though, is that this is a subject that we should think about. I am personally a big fan of data sharing and transparency in research. And I think that if we want Open Science to work, we should think deeply about the complications it might face. One of the aims of Open Science is to increase reproducibility, and we should thus make sure that we do not instead decrease reproducibility by finding more false positives. How do we dig for gold without digging a bigger hole for ourselves?

Now, the great thing is that ideas do not suffer from the multiple comparisons problem, so let’s think about this together! Tell me how you think we can share our science without sharing the multiple comparisons problem!