Untangling the Causal Web: The Mutual Adjustment Fallacy

July 14, 2023

Prof. Daniel Franks

Introduction

When you're seeking to unravel the effect of multiple causal factors or simply adjusting for those pesky 'nuisance' variables only to misreport them as causal effects, you're entering treacherous territory. This perilous journey often leads the unwary—be they practitioners or researchers—into the deceptive embrace of the mutual adjustment fallacy. This fallacy lures you into believing that by controlling for multiple variables, you can isolate the causal effects of each. Sometimes known as the Table 2 Fallacy, this error owes its moniker to the custom of showcasing such ill-fated analyses in research papers' second tables. Both industry and academia are riddled with this mistake.

Fortunately, the structural causal models that can be built with CausaDB enable the isolation and simultaneous examination of multiple variables’ causal effects. But let's look into the problem.


A Simple Example

To illustrate, let's delve into a very basic example from the seminal Table 2 paper. It features a causal diagram focused on estimating the causal effect of HIV on the likelihood of a stroke. This diagram demonstrates the need to adjust for confounders, such as smoking habits (a common cause of both HIV and strokes) and age (a shared cause of smoking, HIV, and strokes).


Imagine that we fit a regression or traditional ML model to estimate the causal effect of HIV on Stroke. Smoke and Age are confounders and so are also included to isolate the causal effect. This model can estimate the causal effect of HIV but the effects of the variables Smoke and Age are not interpretable in the same way: they will be uninterpretable as causal effects. This is because the model is built around isolating the causal pathway of HIV to Stroke only.

Take Smoking for example: the total causal effect of smoking cannot be estimated, because we included HIV in the model. If you look at the diagram, you can see that HIV is a mediator between Smoke and Stroke, and thus carries causality between Smoke to Stroke (forward arrows Smoke → HIV → Stroke). Yet this indirect causal path has been blocked by including HIV in the model, meaning that HIV can no longer pass-on the causal effect from Smoke. From this model, it only makes sense to report the estimated causal effect for HIV. For estimates of control variables to be given a causal interpretation, their effects need to be themselves causally identified. Note that without a causal approach, we wouldn't be able to achieve any of this.


The Kitchen Sink Tactic

In practice, it gets worse. The "kitchen sink" approach is a common tactic in fields like machine learning and many scientific fields, where researchers include all available variables in an analysis, without thought. Including unnecessary variables can block true causal effects and introduce spurious correlations. In the absence of a structural causal model, some of these variables will be 'bad controls' and introduce collider bias (producing spurious correlations),  block mediators (blocking causal pathways), among other problems such as bias amplification, overcontrol bias, and case-control bias (see the excellent A Crash Course in Good and Bad Controls.

This has also been amusingly referred to in a blog by Richard McElreath as Causal Salad:

"You put everything into a regression equation, toss with some creative story-telling, and hope the reviewers eat it. In general, this is not a valid approach... The Salad isn’t only regression. Really any procedure that hopes to take a list of variables (features) and return causal inference is Causal Salad. No amount of data reliably turns salad into sense."

Importantly, the problem can occur even if only some variables are conditioned on. For instance, when examining the causal effect of two variables (whose causal effect you are interested in) simultaneously, one variable that needs to be dealt with as a confounder for one of them might be a mediator or collider for the other. Although the causal effect of one of the exposures can now be discovered, doing so interferes with the causal path for the other.


A Covid Salad

The healthcare literature is, worryingly, riddled with this avoidable mistake. Let's consider an important example where this issue has caused some controversy. The figure below is from a 2020 paper “Factors associated with COVID-19-related death using OpenSAFELY”. It reports the risk of death from COVID-19 infection based on several potential risk factors. The further to the right the point, the higher the risk for that group. The table shown is only a small section of the full list of variables tossed into the analysis. This study suffers  from the mutual adjustment fallacy, rendering the results causally uninterpretable. This is a big problem, because surely the purpose is to provide causal knowledge that allows for informed policy. In fact, the French government enacted policies informed by this study.


At first glance, the results appear reasonable: men are at higher risk than women, older individuals are at greater risk than younger ones—all seems logical. However, it's only when examining smoking that you may notice something amiss. The study claims that being a smoker places a person at a lower risk than never having smoked or being a former smoker. The mutual adjustment fallacy is at play here, and the observed effect of smoking is not causal. It is uninterpretable without a structural causal model.

You might argue that the study is purely descriptive and that among the spurious correlations, some elements of causality might be waiting to be discovered. If this is the case, studies should be transparent about that, and ultimately, we must strive to uncover the causal effects.

How can we tackle these problems? One approach is to have multiple models — one for each variable of interest. While this is a viable option, it is less than ideal. Fortunately, the structural causal models you can build with CausaDB enable the isolation and simultaneous examination of multiple variables' causal effects. Only a carefully constructed structural causal model can help untangle the web of causation.