Written by Colin+ in statistics.

Don't get me wrong, The Dorset Echo is one of my favourite local newspapers. They have been kind enough to feed my ego on several occasions, and even if their headlines sometimes don't quite reflect the gist of the story, I appreciate that.

This time, though, they've gone too far.1

The story, for those of you who can't be bothered to click2, involves the deadly toll of Dorset's roads3: a dramatic 50% increase between 2014 and 2015.

Digging down further, the numbers seem to stack up: 19 fatalities on the roads in 2014, and 28 last year, a 47% rise - close enough to 50% that I'd give them a pass on it.

Assuming each fatality is an independent event4, that's even a statistically significant change. Suppose the number of fatalities, $F$, is drawn from a Poisson distribution; our null hypothesis is that its mean is 19. We can, at a stretch, estimate that as a normal distribution with mean of 19 and standard deviation of $\sqrt{19} \approx 4.36$. The probability of a reading of 28 or more can be calculated from the z-score tables; $z = \frac{28-19}{\sqrt{19}} \approx 2.06$, giving a probability of 0.0197; we reject the null hypothesis at the 5% level.

The story neatly avoided discussing the casualty rates of the preceding years. Luckily, the DfT makes those available, too5 In 2013, there were 28 deaths on Dorset's roads. In 2012, there were 24. If I were less honest, I'd innocently point at the unexpected dip in the graph in 2014, rather than the large increase for 2015. However, I'm not that sort of writer: the figures for 2011 and 2010 were 19 and 18.

A more reasonable estimate for the mean would be the average of the five previous years' figures, which is 21.6. Now the null hypothesis is that $F$ has a mean of 21.6 and a standard deviation of $\sqrt{21.6}$, and the z-score for 28 is $\frac{28-21.6}{\sqrt{26}} \approx 1.377$; the probability of a more extreme result is somewhere about 0.084%, which is not significant at the 5% level.

It would be interesting to see data about the number of fatal *collisions* (rather than fatalities), as well (I'd expect the effect of clustering to increase the standard deviation).

In short, the answer to the Echo's question "why the increase in Dorset?" is probably "noise".

- I expect a "Local tutor's anger at Echo story" story to run in the next few days. [↩]
- understandably, since the Echo website is almost unusable for intrusive ads [↩]
- Dorset, as far as we're concerned, contains Bournemouth and Poole, no matter how much they protest. Unitary authorities, schmunitary authorities. [↩]
- it isn't, of course: many collisions involve several cars [↩]
- another small black mark for the Echo story: no link to the publicly available source, naughty naughty. [↩]

## Mark Excell

Interesting piece and I think the issue of noise ironically effects data sets where quantity is fewer. But apart from the pure mathematics, in this situation taking anecdotal evidence from locations with higher data sets might lend us some parameters to quantify what we are seeing and maybe shed a light on whether the statistic we are seeing is actually reasonable. In fact I would be interested if quantum techniques could be used to analysis the data further.