08 May 2015

Evaluating UK Election Predictions

UPDATE: I have a piece up at the Guardian with draws upon and extends this analysis. See it here. Comments welcomed!

Back in March, I posted up a survey of thirteen various forecasts for the outcome of the UK election and promised to perform an evaluation when the results were in. Well the results are in. Let's jump right into the evaluation. After that, I'll offer some more general comments about predicting elections, data journalism and democracy.

Here are the forecasts that I am evaluating:
Many of these forecasts were dynamic, meaning that they changed over time based on updated information. The evaluation below is based on this snapshot of the forecasts from the end of March. But the results won't be particularly sensitive to the date of the forecast, give how badly wrong they were.

The methodology is a very simple one, and one that I have used frequently (e.g., to evaluate predictions of Olympic Medals, the World Cup, the NCAA tournament etc.) and is the basis of a chapter in my new book.

First, I identified a "naive baseline" forecast. In this case I chose to use the composition of the UK Parliament in March 2015. My expectation was that these numbers were (for the most part) determined in 2010, so any forecasters claiming to have skill really ought to improve upon those numbers. Let me emphasize that the March 2015 composition of the UK Parliament is an extremely low threshold for calculating skill.

Second, I calculate the improvement upon or degradation from the naive baseline. I do this by performing a very simple calculation. I (a) take the difference between the forecasted number of seats for a particular political party and the actual results, (b) square this number, (c) sum these squares across the political parties for each forecast, and then (d) take the square root of the resulting sum. The result is a measure of the total number of seats the forecast had in error.

Let's start with a look at the forecasts for the two biggest parties, the Tories and Labour, which were the only parties which had a realistic chance of forming a government. Here are those results, with RED indicating a worse performance than the naive baseline, and BLACK indicating an improvement (no, there is no black on this graph).
It is difficult to describe this graph as anything other than mass carnage for the forecasters. The predictions were off, and not by a small amount. Nate Silver visited the UK before the election to opine on the election explained to the British public, "What we know is that it’s highly likely you won’t have a majority.” Um, no.

Let's bring in the Liberal Democrats and see how that affected the results. (Note: Only 12 of the 13 forecasts included 3 parties.)
Here we have 2 of 12 forecasts outperforming the naive baseline. Stephen Fischer, who ran a great blog during the election at Elections, Etc. did the best as compared to the naive baseline, but this result is tempered a bit by the fact that their forecast degraded since the March prediction was made, with the election day forecast performing worse. The Naive Forecast, Fischer and Murr did pretty poorly overall, missing between 46-60 seats across the 3 parties.

The other forecast to outperform the naive baseline was produced by Andreas Murr at LSE and use a "wisdom of the crowds" approach. This method was based on asking people who they thought would win their constituency, not who they would vote for. The fact that this method outperformed every other approach, save one, is worth noting.

Overall, the track record of the forecasters for the three-party vote was also pretty dismal.

Let's bring in the SNP and UKIP. (Note: Only 8 of the 13 forecasts included SNP.)
With the SNP revolution occurring in Scotland, we would expect that this would improve the forecasts, since the naive baseline had only 6 SNP members in Parliament. (UKIP turns out to be mathematically irrelevant in this exercise.) Even so, adding in the SNP only raises two other forecasters above the naive baseline. It is worth noting that the worst performing forecast method (Stegmaier & Williams) had the very best prediction for the number of SNP seats.

Even with advance knowledge that the SNP would gain a large number of seats, that head start only led to 50% of the forecasters who predicted SNP seats to improve upon the naive baseline.

Overall, if we take the set of forecasts as an ensemble and ask how they did collectively (simply by summing their seat errors and dividing by the number of parties predicted), the picture remains pretty sorry:
  • Two-Party Forecasts (13): degraded from Naive Baseline by ~38 seats per party
  • Three-Party Forecasts (12): degraded from Naive Baseline by ~17 seats per party
  • Five-Party Forecasts (8): degraded from Naive Baseline by ~0.3 seats per party
So what lessons should we take from this exercise?

One lesson is that while predicting elections is interesting and fun from an academic perspective, it may not add much to our democratic practices. Nate Silver at FiveThirtyEight, for better or worse, has become the face of poll-drive "horse-race journalism" in which the politics and policy choices are stripped out and numbers are pretty much all that matters. This is of course ironic, because Silver used to complain about punditry and horse-race journalism. Yet during his recent PR tour of the United Kingdom he was the ultimate pundit weighing in on the horse race. Not discussed by Silver were questions about subjects such as the future of the NHS, recharging UK productivity, or the desirability of Scottish independence or a possible EU referendum.

My criticism of election forecasts goes back a long way. Back in 2004 I wrote:
Rather than trying to see the future, political science might serve us better by helping citizens to create that future by clarifying the choices we face and their possible consequences for policy.
By simply predicting seats and treating politics like a sporting event, we diminish the partisanship, the choices, and the fundamental values that lie at the core of politics. Politics is about people and our collective future. I fear that data journalists have diminished our politics.

A second lesson is that we often forget our ignorance. Back in 2012 Nate Silver wrote very smartly:
Can political scientists “predict winners and losers with amazing accuracy long before the campaigns start”?

The answer to this question, at least since 1992, has been emphatically not. Some of their forecasts have been better than others, but their track record as a whole is very poor.
The 2015 UK General Election reminds of of this fact. Sure, it does seem possible to anticipate US elections, but this may say something about American exceptionalism (e.g., highly partisan with well-gerrymandered districts, a relatively simple electoral system that is overwhelmingly well-surveyed) rather than anything about the predictability of politics more generally.

I don't mean to pick on Nate Silver (disclaimer: I worked for him briefly in 2014, and admit to sometimes being seduced by horse-race journalism!) but at the same time, his overwhelming presence in the UK elections (and that of other forecasters) was influential enough to warrant critique. I have long had a lot of respect for Nate, not least because in the US at least, he figured out how to systematically integrate and evaluate polls, something that academic political scientists utterly failed to do.

At the same time, here is one example of the overwhelming influence of a dominant "narrative" in popular discourse. One pollster, Survation, conducted a survey before the election that proved remarkably accurate. But they chose not to publish. Why not?
We had flagged that we were conducting this poll to the Daily Mirror as something we might share as an interesting check on our online vs our telephone methodology, but the results seemed so “out of line” with all the polling conducted by ourselves and our peers – what poll commentators would term an “outlier” – that I “chickened out” of publishing the figures – something I’m sure I’ll always regret.
While Survation has to live with the decision not to release their poll, I can understand the pressures that exist not to contradict popular narratives expressed by loud and powerful media bodies. These pressures can mean narrow perspectives that exclude other, inconvenient expertise. Sometimes, the popular narrative is wrong.

The role of data journalists (and their close cousins, the explainer journalists) should not be to limit public discourse, either intentionally or unintentionally by weight of influence, but rather to open it. This means going beyond the numbers and into all the messiness of policy and politics. Data journalism, like our democracies, remains a work in progress.