You see the outlier because it is an outlier

This article, translated from Dutch by Hermen Visser, was written for the Dutch Journal of Medicine (Nederlands Tijdschrift voor Geneeskunde) and appeared on September 22 online and on September 23 in print.

‘It failed’ , answers the STATnews-journalist Damian Garde in a podcast, when asked about the readout of a recent clinical trial on severe depression[1]. Just a bit later he adds how difficult this type of clinical research is. Depression symptoms vary a lot. Finding a group of comparable patients is difficult. Moreover, depression symptoms in the one period are not comparable to the other period, within the same patient. The mentioned study tried to control that variation with a strong selection: three different professionals, independently did the inclusion without knowledge of the patient’s histories. They hoped for a predictable course. And still they saw no effect of the new treatment. The trial described by Garde compares a randomly assigned placebo group with a random group of patients who received the new antidepressant PRAX-114[2]. What the researchers did not find, was a difference between the groups.

It does not mean that the participating patients continued to suffer badly. In studies such as this one it is often just the opposite. Both groups improve and this explains the absence of a difference. While your group of patients might change while you are watching, but in reality no change occurs at all. If you only focus on the treated group, you will see an improvement. But, only after accounting for the control group, you see that the improvement is not caused by the treatment.

Regression to the mean

Most people suffering from chronic symptoms do not show up at their GP or specialist every day. Only if things go really badly, they raise the alarm. When complaints do vary from day to day and one particular day is exceptionally bad, the next day will almost always be better. However, statistically spoken not all patients do recover. If someone has their best day so far, but suffers since long with varying complaints, chances are high that tomorrow it is as bad as usual.

We call this phenomenon regression to the mean. This term is, as more terms in statistics, not the most revealing choice. There is no alien power that forces your measurements to show up close to the mean. Later measurements lay closer to the mean because the first measurement is selected on its extraordinary nature. ‘Winners curse’ would be a better term, but it could just as well be ‘loser’s luck’.

Like a sports coach

‘A statistician is a sports coach’, I wrote in my previous comment in NtvG and on this VVSOR blog [3]. ‘We do not reject a talent after one disappointing performance, but we also do not overly weigh positive outliers.’ We know that outcomes do vary by nature. At the same time we should be aware of the bias we introduce with our own view. An old legend says that the magazine Sports Illustrated brings bad vibes and misfortune by putting the biggest sports stars on its cover. A jinx! However, it does not imply a causal relation. If we would consider the ‘just-not-cover-worthy-performance’, more bad vibes are unlikely. Those outstanding performances on the cover most likely have a component of luck and are therefore not easily replicable.

Or consider your favourite Netflix series and its disappointing second season. How come? Because the first season was brilliant. Do you remember that mediocre series with a second season that is even worse? No? I rest my case. Or think about teen stars who disappoint as adult artists. Maybe that is the curse of peaking too early? Maybe. But, their later performances are also disappointing because of the excessive focus on their extraordinary performances as a youngster in a showbiz world full of luck and bad fortune. It could be all regression to the mean.

The control group

Let’s go back to the study on severe depression. Here you need a group of patients with a depression score that is high enough to participate. Depression scores fluctuate. Therefore the inclusion will miss severely depressive patients that are having a relative bright period. The inclusion will recruit less depressive patients who are going through an excessive dark period. As a result the included patients will become better. Especially if you also take the placebo effect into account. You really need a control group to see the whole picture.

It is insufficient to show that patients do better after treatment. You want to show that the treatment receiving group does better relative to a group without treatment. Without control group it is just impossible to tell. It is your own view, you own study design, that deceives you. If you only consider the treated group, the effects on the next random patient will certainly disappoint. Your optimism is based on the selected group. This biased view will be punished as soon as you try to generalize.

Healthy reluctance

Expats who just arrived in the Netherlands often are indignant when a Dutch GP sends them home to give it a few days. They are not used to it. They do not idly sound the alarm bell. Do they? Still, it is no outrageous idea, especially when suffering from not so life threatening complaints, to wait the extra days .

We all should be reluctant to respond immediately when being confronted with exceptionally high or exceptionally low scores. After exceptionally high scores usually lower scores follow, because exceptional is just the exception. It is the other way around with remarkably low scores. ‘Winners luck’ en ‘loser’s curse’. Two sides of the same medal. It applies to sports performances, but also to varying complaints of a patient who arrives at the GP’s consulting hours on an extraordinary bad or good day.

Postscript

Anne Top shared her knowledge about measuring depression in the GP’s office. The hosts in the ‘Readout loud’ podcast call the improvement in the control group ‘the placebo-effect’, while it could also be ascribed to regression to the mean (or a bit of both).

References

[1] Garde D, Tirrell M, Feuerstein A. Listen: Applause-worthy cancer data, the long wait for Novavax, & the next FDA controversy. STATnews, The Readout Loud podcast (10:42). 9 juni 2022.

[2] Praxis Precision Medicines. A clinical trial of PRAX-114 in participants with major depressive disorder. ClinicalTrials.gov: NCT04832425, geraadpleegd op 10 mei 2022.

[3] Ter Schure JA. Liever gemiddeld goed dan eenmalig uitzonderlijk. Ned Tijdschr Geneeskd. 2022;166:D6642. English translation on the VVSOR blog: https://blog.vvsor.nl/2022/03/why-statisticians-prefer-scoring-good-on-average-over-exceptional-only-once/.

You see the outlier because it is an outlier

Regression to the mean

Like a sports coach

The control group

Healthy reluctance

Postscript

References

Judith ter Schure

Add comment

Cancel reply

Matching methods: arbitrary choices and their consequences

Communication and the value of listening

Can we really ‘learn the natural history of human disease’?

Choose category

Recent posts

Meer data helpt AI lang niet altijd

Matching methods: arbitrary choices and their consequences

Het nut van nutteloos statistisch onderzoek*

Communication and the value of listening

De formatiepuzzel opgelost met wiskunde

Hoe wiskunde helpt bij het indelen van politieke partijen

Stuurt het kieskompas je met een kluitje in het riet?

Can we really ‘learn the natural history of human disease’?

Think before you shrink: a story on battling with reviewers

Follow us

You see the outlier because it is an outlier

Regression to the mean

Like a sports coach

The control group

Healthy reluctance

Postscript

References

Judith ter Schure

Add comment

You may also like

Choose category

Recent posts

Follow us