Do we still need new statistical methods?

Do we still need new statistical methods?

Statistics is the oldest data science. Which makes one wonder: don’t we have enough methods to crunch our data already?

Statistics has been around for over a century. Early heroes like Ronald Fisher, Florence Nightingale and David Cox lay the fundaments of many statistical analyses that are still used today. Being a mature science that is fundamentally a support to other fields implies a risk: researchers may argue we should stop developing new methods and devote all our time to applying known ones. Fair enough, but unwise. Below three reasons why we need to keep on teasing our creative minds.

1. Data-driven research requires rigorous statistics
My employer, an academic hospital, has data-driven research as one of its top research priorities for the next five years. And I’m pretty sure this is no different in many other research institutions. The overwhelming amount of data that has become available drives this priority. Letting go of hypotheses comes at a price, however, as the golden rule is: complex question, complex answer. 

That is, the more unstructured the question is, as is often the case for data-driven research, the more complex the statistical analysis needs to be. Multiple testing, overfitting, missing values, causal inference, high-dimensional data, longitudinal data, interpretable machine learning: do you want me to go on? So, about data-driven research as a priority: we are with you. As long as you appreciate that we need to tailor the answers to the questions.

2. Bright minds demand challenging problems
Whenever I ask first-year PhD-students to start with reviewing literature, they almost die in boredom. Sure, they will do it to serve the lazy supervisor, and themselves, ultimately. But only when the process of creating a new method starts, their enthusiasm rises to the max. Most scientists are no different than artists: they want to create, not copy. No, not all are bright ánd creative enough to actually contribute substantially to the field of statistics. But we will only learn who can by trying, and failing sometimes.

3. The best way to know methods is to try to beat them
Let’s be honest and introspective (ouch): many of the methods that I (co-)developed in the past 25 years will not make it for the Encyclopedia of Statistics. Possibly because hardly anyone read my paper in that stats journal with impact factor 1.5, or my software package was too user-unfriendly, or simply because someone else had a better idea.

Yet, I dare to say that the time I spent on developing those methods was not wasted at all. I had to dive deep into the methods to try to outperform them. And that certainly pays off for my advisory and collaborative work in the hospital. A task that the managers likely find more important than my own research, but one cannot flourish without the other, at least I can’t.

Closing the gap
Of note, the reasoning above does not mean that one cannot be a good statistician when one ‘only’ applies methods. In fact, on this matter the statistics community should learn from the machine learners, who seem to have a more balanced appreciation for developing and applying methods.

I regard it as the biggest challenge for the statistics community in the next ten years: closing the gap between state-of-the-art methods and applications. If we can do this, we will maintain our position as an essential part of the scientific community. A position that Fisher, Nightingale, Cox and many others established for us.

Avatar photo

Mark van de Wiel

Statistiek. Voor veel onderzoekers een last, voor mij een lust.
Medische data sets: ze worden steeds groter en complexer, maar: meer meten is niet altijd meer weten. In mijn blogs wil ik de statistische uitdagingen bij dit soort data uitlichten en veelgemaakte denkfouten rechtzetten. Ik put hierbij uit ruim 20 jaar ervaring met analyse van medische data.

Add comment

To the VVSOR website