Shock-horror statistics

Robert A.J. Matthews

50 Norreys Road, Cumnor, Oxford, OX2 9PT, UK

Summary

The author, who is Science Correspondent of The Sunday Telegraph newspaper, shows how media news reports can serve as a valuable source of real-life applications of probability and statistics.

The media as a source of statistics problems

One of the most dispiriting features of many textbooks and courses in probability and statistics is the sheer dullness and irrelevance of their examples. Quite why any student should want to master a field apparently obsessed with dice, playing cards and deaths through horse-kicks has never been convincingly explained. In what follows, I show how the media's coverage of news events can serve as a rich source of examples of the real-life application of probability and statistics for students at every level.

Basic probability concepts

1. "Doomsday comet surprises astronomers" (The Sunday Telegraph 30.7.95). This story centred on the discovery of a huge comet that had sneaked into the Solar System from deep space. Found by two amateur astronomers, the discovery resurrected fears that humanity may suffer the same fate as the dinosaurs, wiped out by a comet 65 million years ago.

Calculating just how likely such a comet is to hit the Earth is a simple exercise in using the basic concept of probability as the ratio of target outcomes to all possible outcomes. Deep-space comets can sweep into the Solar System from any direction, and so we have

Prob(Impact with Earth) = 2 x (Cross-sectional area of Earth)/(Area of sphere, radius = Earth-Sun distance)

where the factor 2 accounts for the comet's ability to hit the Earth both on its way into the Solar System, and back out again. Taking the radius of the Earth as 6378 km, and the Earth-Sun distance as 150 million km, we find Prob(Impact) = 10^ -9. Assuming that the dinosaurs were killed by a deep-space comet, this figure suggests that as many as 10 Earth-wrecking comets may enter our Solar System each year.

2. "Mystery of impact point of satellite". (The Sunday Telegraph, 30.12.90). The prospect of a 38-tonne Soviet space vehicle Salyut-7/Cosmos 1686 crashing down to Earth was a disturbing one, and everyone wanted to know the chances of it landing on their country. With the seas covering over 70 per cent of the Earth's surface, the smart money was obviously on a big splash rather than a death-dealing crunch. But a slightly more detailed calculation leads to a rough estimate of the chances of an impact on any given country.

A space-probe on an orbit inclined at angle i to the equator can crash on any part of the globe between the parallels of latitude +/- i. The probability that it will crash on a country of area A km^ 2 within those parallels is thus roughly

Prob(crash on country) = A/(Area of globe between +/- i)

The denominator provides a rare example of a simple surface integral doing something interesting, and leads to

Prob(crash on country) = A/ 4*pi*R^2.sin(i)

where R is the radius of the Earth. Plugging in values of A = 240,000 km^2 for the UK, and R = 6378 km, and i = 53 degrees for the Soviet spacecraft, we find Prob(crash on UK) = 1 in 1700. While a lot larger than one might suspect, it is still 500 times lower than the chances of a ocean impact - which is what happened.

Conditional probabilities

1. "True Confessions ?" (The Guardian , 20.9.94). The alarming number of miscarriages of justice following convictions based on confessional evidence has long been a source of media investigations. A simple application of Bayes's Theorem shows precisely what is wrong with confessional evidence - and how to put it right.

Let G be the event of the accused being guilty, and C the event of a confession of guilt. Then for the confession to add weight to the prosecution's case, it must satisfy the inequality

Prob(G | C) > Prob(G)

which, using the odds form of Bayes's Theorem, becomes

Odds(G | C) > [Prob(C | G)/Prob(C | ~G)] x Odds(G)

from which it is immediately plain that confessions only add to evidence of guilt - i.e. Odds(G | C) > Odds(G) - iff Prob(C | G) > Prob(C | ~ G), that is, if the confession is more likely to have come from the guilty than from the innocent. All of which may seem blindly obvious - except that in real-life cases, this inequality appears to have been violated. Specifically, it is by no means clear that the hardened, committed terrorist responsible for some outrage is more likely to confess under a given amount of police pressure than an innocent person plucked off the street. This suggests that terrorist cases in which convictions are secured on little more than confessional evidence are especially prone to miscarriages of justice - which is precisely what has been seen in the UK over the last few years.

2. "Doubts as DNA evidence fails to add up"(The Sunday Telegraph 11.7.93). After a honeymoon period in the late 1980s, in which DNA fingerprinting was hailed as the greatest breakthrough in forensic science this century, doubts about its use began to emerge in the media. Most centered on concern about subtle genetic effects increasing the chances of getting similar DNA profiles from two people in the same ethnic group. However, these effects pale in comparison with the impact of the so-called "Prosecutor's Fallacy" and "Base-rate Effect".

The importance of DNA fingerprinting for forensic science is summed up by Bayes's Theorem in the form

Odds(G | M) = L.R x Odds(G)

where L.R. ("Likelihood ratio") = Prob(M | G) / Prob(M | ~G)

and M is the event of getting the observed degree of match between the DNA profile bands in a scene-of-crime sample and those taken from a suspect. The probability of getting a match from the guilty party is of course unity, and the evidential power of DNA profiling comes from the tiny value of the denominator of the LR, Prob(M | ~G), i.e. the very low probability of getting the observed degree of match from someone innocent of the crime; figures of around 1 in a million are frequently cited in court.

The danger of DNA profiling evidence is the temptation to fall for the so-called Prosecutor's Fallacy: believing that a low probability of getting the observed degree of match from someone innocent of the crime - i.e.

Prob(M | ~G) - necessarily implies that the match found points to a high probability of guilt, i.e. of Prob(G | M).

Bayes's Theorem shows that this is a dangerous "transposition of conditioning". It also shows that to convert from the observed DNA match probability - which is what the forensic scientist quotes - to the probability of guilt, which is what the jury is trying to decide, one needs to know the prior odds of guilt. If there is little other evidence of guilt, this prior can be very low - say, one in several million or more - so that not even DNA evidence can raise the posterior odds of guilt above unity, thus leaving a lot of "reasonable doubt".

3. "Breakthrough in diagnosis of Alzheimer's Disease" (Various newspapers, 12.94)

Alzheimer's Disease is the most common form of senile dementia, but is hard to diagnose unequivocally without a biopsy. In 1994, scientists at the Radcliffe Infirmary, Oxford, announced a brain-scan based method which gives apparently very impressive results in terms of high accuracy and low false positive rates. An application of Bayes's Theorem shows, however, that these headline-catching figures may not be enough to prevent misdiagnosis.

This time, the appropriate form of Bayes's Theorem is

Odds(AD | + ) = LR x Odds(AD) where LR = Prob( + | AD) / Prob(+ | ~AD)

AD is the event of having Alzheimer's disease, and + is a positive diagnosis using the brain-scan technique. According to the researchers, the technique has

Prob( + | AD) = 0.9 and Prob( + | ~ AD) = 0.03, leading to an impressively high likelihood ratio LR of 30. However, as Bayes's Theorem makes clear, the probability that a given patient actually has Alzheimer's disease depends on more than just the LR; it includes Odds(AR), the "base-rate" for Alzheimer's disease within the patient's age-group. The importance of this base-rate becomes clear with a specific example. Post-mortem examinations suggest that around 1 in 50 of those in the 65-70 year age groups develop Alzheimer's disease. Setting Prob(AR) = 0.02 leads to posterior odds of the disease, given a positive result, of just 0.6. In other words, even with a positive result from this "90 per cent accurate" brainscan technique, it is still odds-on that the patient does NOT have Alzheimer's Disease. The lesson is a crucial one: even impressively accurate tests can fail badly if they are being used to predict intrinsically unusual phenomena. The impact of the "base rate effect" in such controversial subjects as weather forecasting, cancer screening programmes, drug testing of athletes and self-diagnostic kits for diseases like AIDS - all of which can be analysed by the same general approach as that given here - should make for some interesting discussions with students.

Significance testing

"More than happenstance: CJD in farmers"( British Medical Journal, November 1995). media coverage of this paper, by Sheila Gore of the UK Medical Research Council, triggered public concern about the link between bovine spongiform encephalopathy (BSE) in British cattle and Creutzfeldt-Jakob Disease (CJD), the equivalent encephalopathy in humans. On the face of it, the calculation was simple, and its conclusions deeply worrying. Between 1990 and end-1995, four farm-workers had contracted CJD, a rare disease normally expected to produce just 40-50 cases in the whole of the UK each year. As far fewer than 1 in 10 people work on farms, the obvious question is: what is the probability of getting so many cases among farm-workers by fluke alone ?

This is the question Gore set out to answer with a straightforward calculation. The expected number of cases of CJD to be seen over 6 years in a farm-worker population of N is K, where

K = 6 x 45 x N = 4.7 x 10^ -6 x N / 57 x 10^ 6

The probability of getting at least four CJD cases over the six years by fluke alone is thus a simple Poisson sum equivalent to a p-value for the fluke hypothesis:

----x=3-

Prob( 4 or more cases) = 1 - \ exp(- K). K^x / x!

/

---x=0- x!

But before this probability can be calculated, we have to pin down the value of N. Gore herself hedged her bets here, and calculated a whole range of values


Population..............................N.............. K.................... p-value


All UK farm-workers................................500,000.................. 2.35................................. 1 in 5

Dairy farm-workers.................................155,000...................0.73................................. 1 in 145

Those on BSE-affected farms................. 105,000.................. 0.49................................. 1 in 500

Males on BSE-affected farms....................46,000.................. 0.22............................... 1 in 12,500


Table 1: p-values for 4 or more CJD cases among UK farm-workers

Clearly, the choice of population size makes a big difference, with one choice leading to an unexceptional p-value of around just 1 in 5, while another - males working on BSE-affected farms - generating headline-grabbing 1 in 12,500 odds against the four CJD cases being a "statistical freak". But how realistic is it to choose the latter population ? What are the reasons for focusing in on just men - apart from the fact that all four cases were males? Then comes the question of what the p-value actually means. To many people, a low probability of getting at least the observed number of cases by fluke alone necessarily implies a high probability of the BSE-CJD link being real. But this is, of course, a false inference based on transposition of conditioning: Prob(at least 4 cases | random chance) cannot be simply inverted to answer the real question on everyone's mind, namely Prob(BSE-CJD link is real | observed 4 cases). Bayes's Theorem shows that to make the conversion, one needs some assessment of the prior probability of a link between BSE and CJD in humans. And at the time of writing, the bulk of evidence is against such a link.

This suggests that p-value calculations exaggerate - perhaps substantially - the real "significance" of the 4 cases among UK farm workers. Even a simple Bayesian calculation using a reference prior density proportional to 1/K tends to support this conclusion. The resulting 95 per cent confidence bounds on K extend from 0.71 and 8.0, encompassing the dairy farm-worker value, which thus appears 7 times less "significant" than the p-value of 1 in 145 suggests.

While this confidence limit calculation does little more than the p-value to answer the question on everyone's mind, it does at least serve as a warning against misinterpretating p-values - which certainly bears repeating.

Conclusions

The media are often ridiculed for their gormless handling of probabilistic concepts, from the National Lottery to school league tables. However, their endless fascination with uncertainty, risk and chance makes them a valuable source of examples of how statistics and probability affect real-life events.

Reference

1. The electronic version of Chance magazine also carries (mainly US-based) examples of such applications: http://www.geom.umn.edu/locate/chance.

Back to home-page