Here is a brisk piece by Tim Harford on Causation and Cigarettes, where he looks at how Big Data help (helps?) us see patterns that were not obvious before or confirm our previously unproved hypotheses:

We cannot rely on correlation alone, then. But insisting on absolute proof of causation is too exacting a standard (arguably, an impossible one). Between those two extremes, where does the right balance lie between trusting correlations and looking for evidence of causation?

Scientists, economists and statisticians have tended to demand causal explanations for the patterns they see. It’s not enough to know that college graduates earn more money — we want to know whether the college education boosted their earnings, or if they were smart people who would have done well anyway. Merely looking for correlations was not the stuff of rigorous science.

But with the advent of “big data” this argument has started to shift. Large data sets can throw up intriguing correlations that may be good enough for some purposes. (Who cares why price cuts are most effective on a Tuesday? If it’s Tuesday, cut the price.) Andy Haldane, chief economist of the Bank of England, recently argued that economists might want to take mere correlations more seriously. He is not the first big-data enthusiast to say so.

I think he takes a mis-step here:

Indisputably, smoking causes lung cancer and various other deadly conditions. But the problematic relationship between correlation and causation in general remains an active area of debate and confusion.

He seems to think that by throwing in an unarguable adverb (Indisputably!) he makes his case.

But what does that sentence mean? In what sense does smoking ’cause’ lung cancer?

Smoking does not always ’cause’ lung cancer. In fact the great majority of smokers do not get lung cancer, plus many people who do not smoke alas get lung cancer:

Among male smokers, the lifetime risk of developing lung cancer is 17.2%; among female smokers, the risk is 11.6%. This risk is significantly lower in nonsmokers: 1.3% in men and 1.4% in women.

These numbers show that there is a higher risk of illness associated with smoking, not that smoking causes the disease.

As previously described here going right back to the start,  the very idea of causation is fraught with philosophical difficulties. My first encounter with Jurisprudence in 1977 or thereabouts was this baffling masterpiece. It turns out that the idea of law itself is based on all sorts of implicit or explicit assumptions (for better or worse) about causation and risk.

Take this example. A busy woman leaves her dog in the car at the top of a steep hill as she dashes into a shop. She apples the handbrake. The dog jumps into the front of the car and dislodges the handbrake. The car trundles down the hill and collides with an oil-tanker. The oil-tanker explodes, wrecking a nearby electricity substation. That causes power to be lost in the local hospital at a crucial moment in a sensitive heart operation, and the patient dies as the hospital’s emergency generators do not come on immediately due to poor maintenance resulting from a recent strike by ancillary workers.

The patient’s relatives want to know whom they should sue for damages. The hospital? The ancillary workers? The oil-tanker manufacturers? Mother? Woman?

In this case of one damn thing leading to another, it seems fair to say that the accidents would not have happened ‘but for’ the dog dislodging the handbrake. So did the dog’s action therefore cause the hospital patient to die? We don’t like to accept that. The originating episode seems too ‘remote’ from the final outcome. But that is only a value-judgement or even a matter of aesthetics.

In the case of cigarettes the act of smoking seems to pass the remoteness test. The smoker draws all sorts of nasty and arguably addictive chemicals directly into his/her lungs, thereby directly (when in doubt, add an adverb!) increasing the risk that in some mysterious way deep in the body’s cells something someday may happen that creates a problem that the body’s immune system can’t defeat. The addiction point is important. To establish sufficient causation to justify damages claims against cigarette companies by smokers who have suffered serious illnesses, the courts need to establish that the smoker in effect had no choice but to smoke (thereby increasing the risks) because substances within the cigarettes themselves acted to reduce the smoker’s propensity to stop smoking.

See also asbestos. How much asbestos is harmful? Does asbestos ‘contaminate’ the environment? If you can face it, here’s me in 2013 at a conference about asbestos opining on what those questions might actually mean (note where I blithely assert that some asbestos products do ’cause’ serious illness!)

Conclusion?

We need to be careful in throwing around statements that X causes Y. All we know is that the data show more or less convincing correlations between X happening and Y occurring. There is a 100% chance that someone jumping from the top of the tallest building in the world and plummeting to the ground without a parachute will die IF there is not a huge vat stuffed with cotton wool at the bottom to prevent that messy outcome. There is a notably less than 100% chance that breathing in smoke regularly will lead to your death from a smoke-related (sic) illness.

Do Big Data help us establish ‘causation’ more accurately? No. But new and unexpected patterns might emerge that suggest how combinations of risks interact unexpectedly.

Though even then some patterns are just, well, luck. Their probative value can not be assumed. Quick, give me another grant! We need more data to help us understand what Big Data are telling us!

Anyone claiming that smoking causes cancer (and therefore by strong implication that cigarette manufacturers should be legally liable for deaths apparently ’caused’ by smoking) needs to think how precisely that example of causation might be distinguished as a matter of law and morality from other examples of causation. Such as those people who pump out malignant addictive products that poison minds, not lungs:

The moral case here is that they [cigarette manufacturers] have put on the market products which cause weakness of will thereby causing damage, and/or allow the weak-willed, the reckless and the dangerous to damage themselves and others.

Do those who favour such litigation also favour opening the way to the victims of Islamist terrorism suing extremist mosques for putting out vicious ideas and arguments which prey upon the weak-willed and/or reckless and dangerous, and thus provoke some of them to violence?

Some nice tight causation here, m’lud?