„Paul (Kraken), monkeys and pollsters – caught in the agony of choice“
Paul (Krake) – Wikipedia, heritage of the worlds greatest of all our fortune tellers.
A lot of American election poll graphics, even from german based data providers such as Statista, drive their numbers from the American provider „RealClearPolitics“ (RCP). RCP which actually doesn’t create any poll itself but aggreagates an average of other polls, using their own, unknown heuristic algorithms. These algorithms are absolutely trial and error based and not reliable in any mathematical way.
There is a broad discussion with numerous arguments, why election polls themselves are misleading by default, regarding the way they are evaluated:
- a group of a necessary representative sample of voters is difficult to assemble and aberrations are „adjusted“ by pollsters somehow,
- polls are calculated to 95% of all votes, but real election participation usually varies and covers only between 50 to 70 percent of all voters which may distort the election significantly, as you do not know which part of the electorate will vote in the end on electoral day,
- some pollsters average numbers of selected voters as they simply couldn’t contact them e.g. in their telephone surveys,
- election behavior will be completely differ on election day than form the suggested behavior weeks before,
- there is always a big group of voters who simply do not have any opinion on their future vote on election day and just give a random guess on their own vote,
- given answers in a public or even more private situation will be different from secret choices made in a polling booth
… and so on, and so on.
The wrong diagram or how to become president-elected with only 23%
The interesting question before election day is: Which candidate has the best chances to become president-elected? And before we talk about wrong design, we have to state that all newspapers and -pages answer this question with the biggest forecasting failure of all: They offer the wrong diagram!
To better understand why, let’s have a short look at the US electoral system: All entitled people vote within their states and for each winner in a state (with only differences in Maine and Nebraska) the candidate gets all the electoral votes of it („the winner takes it all“).
BUT: the distribution of electoral votes is not proportional to the population of a state. So states with more residents get less electoral men compared to small states, and this can lead to very distorted results! It can end up in a 54% lead in public votes of one candidate and still the other one becomes president. Have a look at the images below and imagine one candidate gets nearly 100% of all public eligible votes in nearly 11 of the most populated states, but only 49% in all other states. This means, he’ll still loose the race for presidency. Even if he got only 51% of all popular votes in the blue states, he will still gain a 10 percent lead in public votes – and loose, because he will get only 49% of all electoral votes!
This is why the idea of showing public votes as a kind of forecast instrument itself simply doesn’t work. The president doesn’t become elected by popular votes, so why should anyone show the popular votes summarized for the US? And why did no one show the electoral college in the forecasts instead? The answer is quite simple: The polls are not designed to cover it!
The 11 blue colored states cover 55% of all public votes, but only 49% of the electoral college.
So the candidate becoming president-elected will only need 23% of the public votes.
Additional infographic failure of visualizing data of polls
But independently from this major fault mentioned above, ixtract wants to highlight another basic critic, which turns the results of these polls to a pure belly prediction and is commonly ignored by all poll critics: It’s the nonsense of their falsifying infographical visualizations!
Let’s take a look at what we get to see and what we are talking about. Here are some examples of the polling result graphics from Statista, The New York Times, Spiegel Online and finally RCP itself regarding the national US presidential elections for 2016:
Election poll charts from Statista, The New York Times, Spiegel Online and RCP on Oct, 19th 2016
All of these charts have in common that they create the imagination of an exact numeric support for one candidate (Donald Trump or Hillary Clinton) at a given time, illustrated by the line graph they all chose for visualizing the results.
Overdrawn opinon shift of electorate on Oct, 19th 2016
First fault dramatizes the battle for the White House
While all charts show a cut x-axis by default the race for the White House gets an overdrawn up-and-down shift of the apparently rapid changing electorate’s support. That’s simply not true. In fact there are just about 4-5 percent of all voters who still can’t decide how they will vote on electoral day. And there is a big mass of stem voters, a solid basis of support for each candidate, which did not change for months!
You just can’t see it in the distorting charts.
The hidden 80 percent of stable voting bases on Oct, 19th 2016
Secondly misleading and hidden basic information
Although the line of the graph implies to show the results of one poll, there is absolutely no information in any label that this line represents in fact the somehow generated average of many different source polls with absolutely different and often even contrary results!
The mixed up source of our poll-line on Oct, 19th 2016
Thirdly fishing in troubled waters
If you think: what a mess, you’re right! But it’s the base for our prophecies and it is still wrong!
Every poll has an error margin, and within this error margin it is very likely that the correct result of the poll is located somewhere within this possible intervall, and not exactly in the middle of it.
So if the result of a poll is given with e.g. 40% based on a ±5% error margin than it’s very likely that the real score maybe 43% or even 37%. It is simply not possible to get a more accurate statement with only the limited amount of interviewed voters for this specific poll.
And remember the quickly shifting voters: the margin of error for the poll is similar to the uncertainty of quickly shifting voters. So put together both aspects sum up to an increased uncertainty. And remember that you still do not know which part of the electorate will really take part in the ballot, which creates an even bigger uncertainty. With the mentioned, empirical based algorithms RCP and others try to determined the true distribution of votes, now, instead of all this uncertainties which are not shown in any visualization.
And what might seem logic is actually no solution for the increased uncertainty: even if you magically average many inaccurate results, for getting a proper and better result with less uncertainty, the averaged score will still not get more accurate! So below you can find a variety of representative polls showing the support for Hillary Clinton and Donald J. Trump within their given uncertainty or their error margin:
Major poll results for Hillary Clinton and Donald J. Trump with their given error margin
And to be honest, if you compare percent values, you should always refer to 100 percent. Now it’s up to you what you think of the opinion poll-line… at least the Huffington Post (the big competitor of RCP) offers a way to adjust the axes and displays the source polls in the back. But who would make this effort to proper adjust a misleading chart in an article?
All scores related to 100 percent as we speak about per-cent on Oct, 19th 2016
Who stole the votes?
What some readers might completely forget about is the fact, that in the majority of all poll charts the voters support only sums up to about 85 to 90 percent of all votes. So where did they lose the missing 10 percent? We can have a look at „The New York Times“ and do what estimated no one does, we adjust the chart, as they offer this option similar to the Huffington Post, too. And after we stretch the axis down to the origin, we cover up a forgotten candidate!
So who is Gary Johnson who stole the missing votes? Never saw him on TV… (charts still from Oct, 19th, 2016)