Friday, November 3, 2023

Explaining error in polling and why most polls aren't inaccurate

 A common criticism I hear of polling this election cycle is that it's inaccurate and therefore we should discount it. Uh....just to throw this out there, but polling comes with a certain amount of acceptable error.When you poll a population, you are trying to simulate how that population behaves. You ideally want to poll randomly, or semi randomly, with factors such as those who are likely voters weighed more heavily than non voters. You don't want a poll that has a certain selection bias that leads to inaccurate results. Take the 1936 election and how pollsters predicted Alf Landon would win when they only polled people who had landlines at the time, and only rich people had landlines and they hated FDR. So yeah, you want to poll people randomly, but also getting the characteristics of voters and excluding non voters. This is why polls often focus on older voters, for example, because young people statistically vote less. 

On top of that the accuracy of polls depends on the sample size. if you poll one person, you might find that 1 person wants trump and therefore he wins the election. But if you had a 50/50 shot of getting a biden or trump voter, you're not going to have a representative sample. Then say you poll 10 people. Okay, well, you might get 6 people voting trump and 4 people voting biden. That means a 20 point lead for Biden even if the reality is say, also close to 50/50. 

The more people you poll the more granular polls are. Of course, the more people you poll, the more expensive it gets, so there's some tradeoff. The gold standard for polling is generally around a sample size of 1000. This translates to roughly a 3 point margin of error. What does that mean? Well it means this. If you have a poll that has 50% going biden, and 50% going trump, those numbers can each be off by up to 3 points within a 95% confidence interval. That means we can be 95% confident that Trump will score between 53-47, and so will Biden. So we might actually see up to a 6 point difference. 

Of course, not all polls necessarily have a 3% margin of error. Looking on realclearpolitics, a lot of nationwide polls often have sample sizes like 1600 and have a margin of error closer to 2-2.5%. Whereas a lot of state level polls might have sample sizes around 700 and have a margin of error of 4%. Heck, most polls I look at generally range from 3-5% in terms of margin of error. For the sake of simplification, I go with a 4% assumption of margin of error. This is why I look at states with less than an 8% margin as battleground states. I can assume, with a confidence level of around 97.7%, that that at an 8% margin, the candidate ahead will win. Why not just 95% or whatever? Because With a 4% margin of error, and 2 standard deviations (the same as a margin of error in this case) being 95% confidence, I can be 95% confident the outcome will fall within 0% and 16%. You understand why? Say Biden is at 54 and Trump is at 46. Well, Biden can go within 50-58%, and Trump can go within 42-50%. So the outcome can be as close as 50-50, or as far away as 58-42 and still be technically accurate. And if we get say, 59-41, well, there's a 2.3% chance that can happen, but it still means Biden wins. There's only a 2.3% chance that Biden will do worse than 50%, and Trump better, meaning there's only a 2.3% chance Biden will win. 

That's why when I do my predictions, you see me give percentage estimates of where I think states will fall. it doesn't mean that it will always be right. I mean, if a state is at 70% chance of a Biden win and it goes Trump, that can happen. It only happens around 3 out of 10 times though. Outcomes aren't truly strange unless they can only happen around that 2.3%/2.5% of the time or so. And even then....that's 1/40. Look at how many predictions we do. I look at around 15-20 swing states per election, you can expect one state to be off by a significant amount once every 2-3 elections, or once every 8-12 years. Will that always flip the election? No. I mean, if the trend leans a certain way it can, but as we saw, North Carolina and Florida went trump, and then Georgia went Biden. So it can happen. 

Even a wisconsin 2016 outcome (only a 5% chance of happening) can happen one time out of 20. So yeah...that's just the nature of polls. 

Do you guys understand why polling can be so messy in practice? Nothing in 2016 or 2020 was outside of the expected range of outcomes, as I've previously demonstrated several times already. And that's what polling, properly understood, delivers, a range of outcomes. They're not always going to be 100% dead on. They can be off a few points and still be TECHNICALLY correct. 

I'm a dude who looks at numbers in terms of elections. When I do predictions, I tend to go with the outcome that is MOST LIKELY. It is not necessarily the outcome we will get, as I showed recently with my simulations, there can be a significant amount of randomness involved. So yeah. Just something to keep in mind. The number I have the most confidence in when I give an election prediction is the tipping point prediction. Since most elections seem to be either "blue waves" or "red waves", the "wave model" of elections seems to have some legitimacy, even if there is some randomness on individual state levels, the most important number is how much the polls in general can be off relative to the outcome. As we see with some elections like 2008-2012, polls can be pretty accurate and dead on. In 2016 and 2020, the outcomes was a bit more trumpy than elected, although in 2020 I fully acknowledge that removing those right leaning polls was a bit of a mistake and the results were pretty much well within that model's predictive range.

And to give you an idea of how much of a lead is needed to give you a certain level of confidence (assuming 4% MOE), this is basically how it breaks down.

0%- 50%

1%- 59.9%

2%- 69.9%

3%- 77.4%

4%- 84.1%

5%- 88.4%

6%- 93.3%

7%- 96.0%

8%- 97.7%

9%- 98.8%

10%- 99.4%

11%- 99.7%

12%- 99.9%

And you get the idea. Like, if there's a 0% margin, it's literally 50-50, either guy can win. With 1 point, it very quickly becomes a 60-40 type thing. I mean the other side can still easily win, but the outcome is favored by the other side. 2 points brings that up to a 70-30 outcome. And from there it just becomes harder and harder for someone to win. You can see why I often don't even bother looking at states beyond a 8-10 point lead, and even then I only look at 8-10 if they were recently swing states and I have a reason to include them. Like I've added and removed New Hampshire a few election cycles ago depending on how long and consistently they've been beyond 8. And I know a lot of rust belt states like wisconsin and michigan often polled in the 8-9 point range only to end up much closer in my 2020 estimates. So sometimes there is a reason I include such states, but generally beyond 8 or so points the chance of them winning any particular state is deemed "not significantly significant". it doesnt mean it can't happen. Just that it's a long shot we shouldn't count on. And in this model, we're talking an outcome of 1 in 40 or worse. 9% goes to around 1 in 80. By the time you're at 10% you're talking like 1 in 166 or something. And by 10% you're talking like 1 in 1000. So yeah. The odds very quickly go down once outside of the confidence interval, and even outside of the first couple points of error. Most states that flip in outcome normally seem to be within 2-3 points I notice. Beyond that it becomes significantly harder to flip states. 

And yeah. I just wanted to discuss some statistics and explain why polls are normally accurate...within the fine print of what the polls are supposed to predict. Doesn't mean they'll be on the nose, but polls generally predict a range of outcomes, with the outcome listed as only the most likely one.

No comments:

Post a Comment