This article, translated from Dutch by Hermen Visser, was written for the Dutch Journal of Medicine (Nederlands Tijdschrift voor Geneeskunde) and appeared on September 22 online and on September 23 in print.
‘It failed’ , answers the STATnews-journalist Damian Garde in a podcast, when asked about the readout of a recent clinical trial on severe depression. Just a bit later he adds how difficult this type of clinical research is. Depression symptoms vary a lot. Finding a group of comparable patients is difficult. Moreover, depression symptoms in the one period are not comparable to the other period, within the same patient. The mentioned study tried to control that variation with a strong selection: three different professionals, independently did the inclusion without knowledge of the patient’s histories. They hoped for a predictable course. And still they saw no effect of the new treatment. The trial described by Garde compares a randomly assigned placebo group with a random group of patients who received the new antidepressant PRAX-114. What the researchers did not find, was a difference between the groups.
It does not mean that the participating patients continued to suffer badly. In studies such as this one it is often just the opposite. Both groups improve and this explains the absence of a difference. While your group of patients might change while you are watching, but in reality no change occurs at all. If you only focus on the treated group, you will see an improvement. But, only after accounting for the control group, you see that the improvement is not caused by the treatment.
Regression to the mean
Most people suffering from chronic symptoms do not show up at their GP or specialist every day. Only if things go really badly, they raise the alarm. When complaints do vary from day to day and one particular day is exceptionally bad, the next day will almost always be better. However, statistically spoken not all patients do recover. If someone has their best day so far, but suffers since long with varying complaints, chances are high that tomorrow it is as bad as usual.
We call this phenomenon regression to the mean. This term is, as more terms in statistics, not the most revealing choice. There is no alien power that forces your measurements to show up close to the mean. Later measurements lay closer to the mean because the first measurement is selected on its extraordinary nature. ‘Winners curse’ would be a better term, but it could just as well be ‘loser’s luck’.
Like a sports coach
‘A statistician is a sports coach’, I wrote in my previous comment in NtvG and on this VVSOR blog . ‘We do not reject a talent after one disappointing performance, but we also do not overly weigh positive outliers.’ We know that outcomes do vary by nature. At the same time we should be aware of the bias we introduce with our own view. An old legend says that the magazine Sports Illustrated brings bad vibes and misfortune by putting the biggest sports stars on its cover. A jinx! However, it does not imply a causal relation. If we would consider the ‘just-not-cover-worthy-performance’, more bad vibes are unlikely. Those outstanding performances on the cover most likely have a component of luck and are therefore not easily replicable.
Or consider your favourite Netflix series and its disappointing second season. How come? Because the first season was brilliant. Do you remember that mediocre series with a second season that is even worse? No? I rest my case. Or think about teen stars who disappoint as adult artists. Maybe that is the curse of peaking too early? Maybe. But, their later performances are also disappointing because of the excessive focus on their extraordinary performances as a youngster in a showbiz world full of luck and bad fortune. It could be all regression to the mean.
The control group
Let’s go back to the study on severe depression. Here you need a group of patients with a depression score that is high enough to participate. Depression scores fluctuate. Therefore the inclusion will miss severely depressive patients that are having a relative bright period. The inclusion will recruit less depressive patients who are going through an excessive dark period. As a result the included patients will become better. Especially if you also take the placebo effect into account. You really need a control group to see the whole picture.
It is insufficient to show that patients do better after treatment. You want to show that the treatment receiving group does better relative to a group without treatment. Without control group it is just impossible to tell. It is your own view, you own study design, that deceives you. If you only consider the treated group, the effects on the next random patient will certainly disappoint. Your optimism is based on the selected group. This biased view will be punished as soon as you try to generalize.
Expats who just arrived in the Netherlands often are indignant when a Dutch GP sends them home to give it a few days. They are not used to it. They do not idly sound the alarm bell. Do they? Still, it is no outrageous idea, especially when suffering from not so life threatening complaints, to wait the extra days .
We all should be reluctant to respond immediately when being confronted with exceptionally high or exceptionally low scores. After exceptionally high scores usually lower scores follow, because exceptional is just the exception. It is the other way around with remarkably low scores. ‘Winners luck’ en ‘loser’s curse’. Two sides of the same medal. It applies to sports performances, but also to varying complaints of a patient who arrives at the GP’s consulting hours on an extraordinary bad or good day.
Anne Top shared her knowledge about measuring depression in the GP’s office. The hosts in the ‘Readout loud’ podcast call the improvement in the control group ‘the placebo-effect’, while it could also be ascribed to regression to the mean (or a bit of both).