data protection

No, those data are not yours

Why do so many scientists protect the data they collect? Mark van de Wiel shares three common excuses and a solution.

Data are the new gold. Many companies make their money by selling data. We may have concerns  about that, but in any case the business community pays for the data collection itself. In science, data collection is largely paid by tax euros – think of electronic patient records, for example – or by grant bodies, often charities. So, that means ‘us’. Fine, right,  because scientists do useful things with it?  Quite true, but unfortunately many scientists anxiously protect this new gold. With this post, I share my concerns and frustrations about these practices, and suggest a solution.

Frustration

Many researchers are frustrated by the following sentence, often mentioned in scientific articles: “Data are available upon request”. This means: “We don’t want to share the data at all, but the journal demands such a statement. If you really want the data, we won’t respond to your emails for two months. After that, we’ll have you fill out five forms on which the signature of the chief executive is required. Finally, if we would share the data with you, we will do so in such a format that it will take you at least another month to decipher it.” Well, never mind. Below, I will refute three frequently mentioned excuses for not sharing data.

Excuse 1: “But then I will get scooped.”

Scientists are often afraid that others are better at extracting interesting information from their data than they are themselves. Yes, so what? We’re not writing for tabloids, are we? Many scientists forgot about what science is all about: gathering and sharing knowledge. When you have collected the data and shaped your ideas, you have a huge advantage over the rest of the world. You may, of course, exploit this advantage. If you can’t do that, it’s probably better that the rest of the world does something valuable with those data.

Excuse 2: “I do all the work, while others take the credits”

A. You get paid for it. That’s a very valid reason to do the job, but scientists sometimes seem to forget that.
B. If you do it smartly, data will yield loads of citations, the ‘likes’ of science.
C. Please, realize that a good analysis of the data is often just as much work as generating these.

Excuse 3: Privacy

Privacy is the biggest excuse for not sharing data. All right, not quite an excuse, because the European rules are pretty strict on this. But for 95 percent of medical data, the identity of the person cannot be traced when the data were anonymized cleverly. For the other 5 percent, there are tricks to prevent traceability. However, I do understand that not all data can easily be shared with just anyone. Nevermind, I’m fine with filling out one form for people to verify my legitimacy as a researcher, and to declare that I will use the data for scientific research only. If you approve within one week, that’s totally okay. Two locks on the door – anonymize and legitimize – should be enough, right?

Scandinavian statisticians

Sharing leads to knowledge increase leads to expertise. It’s that simple. Scandinavian statisticians have been world leaders on survival analysis for decades. Survival analysis aims to predict and explain survival (or death) of people using data from those people. This includes age, gender and, in a medical setting, severity of the disease. Why are Scandinavians so good at survival analysis? Because of their long tradition of recording survival data very accurately. Interesting for insurance companies, of course. But at least as relevant: sharing these data with statisticians lead to an enormous boost of the methodology for analyzing this kind of difficult data. And to this day, scientists from all over the world want to collaborate with those statisticians.

Solution

As usual, the solution is in the hands of those with power: the grant bodies. If, say, the Dutch Cancer Society provides a grant to a cancer researcher, it should be straighforward to attach a condition to it. “Fine, you get the money, but after February 1 of year X  the data will also be available to other researchers in the Netherlands. And upon publication also for the rest of the world, in a FAIR1 format.” As a grant body, this strategy leads to much more value for money than when supporting one scientist whose idea may not work. Data are the new gold, but let’s all make sure that this gold becomes really valuable.

1FAIR = Findable, Accessible, Interoperable, Reusable

Credits

The Dutch version of this post was first published on the blog of Mark van de Wiel.

Main image: Michal Jarmoluk on Stocksnap

Mark van de Wiel

Statistiek. Voor veel onderzoekers een last, voor mij een lust.
Medische data sets: ze worden steeds groter en complexer, maar: meer meten is niet altijd meer weten. In mijn blogs wil ik de statistische uitdagingen bij dit soort data uitlichten en veelgemaakte denkfouten rechtzetten. Ik put hierbij uit ruim 20 jaar ervaring met analyse van medische data.

Add comment

To the VVSOR website