Thursday, July 11, 2024

Understanding my election predictions (methodology and how to read them)

 So, as we know, I recently went fully "digital" with my election predictions. Previously, I would painstakingly make a new chart every few weeks in which I calculated much of my election predictions by hand. However, recently I went fully digital with my predictions going all in with excel (google sheets since I'm too poor for excel) charts. I figured now is a good time to explain how my model works and what it does, and given the ubiquity of election models this election, what it DOESN'T do. 

First, the election model:

Okay, so I put numbers all over it where I can walk you through this.

1) First, we go to the bottom. Here, I record the national vote like I would at the beginning of my original election predictions. Why is it at the bottom? Because I couldn't make my chart work properly if I didn't put it at the bottom in this location. The entire chart is held together by the margin column as I'll get to in #6, and if I don't put the national predictions here like this, the entire chart ends up getting screwed up. 

 Normally I just have Biden and the 2 way and 5 way here, but because I also want to keep tabs on how alternate candidates perform, I currently have Harris, Newsom, and Whitmer mentioned (I'm writing this in July 2024 when there was real talk of replacing Biden). 

2) Here I record the winner. I actually have this tied to the margin column in #3, where I have it automatically write "Trump" in red if Trump is ahead, and "Biden" in blue if he is ahead.

3) Here I record the margin for the national vote as per the realclearpolling average. Democrats get a positive value, republicans a negative one, and the colors shift based on 1/4/8 margins. <1 is black, 1.1-4 is light red/blue, 4.1-8 is a darker shade, and 8+ is the darkest shade of all. This corresponds to how my electoral maps are shaded. 

4) Here we go to the state level forecast. On the most left hand column I list all of the states that I consider "in play". Technically I only consider states below a margin of 8 points "in play", but margins fluctuate so I'm keeping everything within 12 on my chart so we can see how it moves in and out. It's not uncommon for a state like New Hampshire, Wisconsin, or in this election cycle, Florida, fluctuate in the 7-9 range, sometimes being "in play" and sometimes not. Another reason I'm going up to +12 this time is to really give the viewers a good idea of what's actually going on this cycle. I mean, people might wonder, in the red column, what happened to Ohio, Iowa, and Florida. I mean, those used to be swing states, didn't they? Yeah. And now they're not. The electoral map has changed significantly from the Bush era and those states are now safe red. It also is good to point out what's going on in the blue column since many states that are normally considered safe blue are currently at or near swing state territory. This includes New York, New Jersey, Washington, Illinois. Yeah, this is really a bad year for democrats. These states are normally unthinkable of being in play, but yeah...they're in play. We need to stop thinking about just the seven swing states the media focuses on. I mean, look at some of them. They're in the 4-6 for Trump. Meanwhile I still get people acting like Minnesota or Virginia being in play is unthinkable. Not, it's not. Anything under 8 points, for reasons I'll get to a little later (see #7 and #8) are technically in play and have more than a 2% chance of flipping. 

5) Much like below, depending on the value for #6, the results will record whether Trump or Biden is expected to win the state in question.

6) This is where the magic really happens. This entire chart, minus a couple of things (such as 1, 2, 3, and 13) are held together by this margin column. This is where I go to realclearpolling, look at what their average is, and I record the value for each state. Positive values are Biden wins, negative values are Trump wins, the winner column automatically changes depending on the value of this column, and then the rest of the chart also changes based on it in ways I'll get to in discussing the other elements. And then I just sort the entire chart by this column by the margins recorded (hence why nothing else can be in this column, and why the national stuff can't be on top) and the rest of the chart fills itself out. 

7) So, to understand what a Z score is, we need a little statistical knowledge. Essentially, my model operates on the assumption that polling accuracy exists on a bell curve. 

So this is a stock adobe bell curve that I modified. Basically, as we know, all polling has a margin of error. Normally, at best, it's 3 points, this normally assumes all polls have large sample sizes though (1000+) and they don't always do. Polls can have margins of error as high as 5 points. What a margin of error assumes, is that the they can be 95% confident of the results. It just so happens that 95% confidence happens at just short of 2 standard deviations, or margins of error, from the stated value. Technically, you're like 96% confident at exactly 2, but yeah, that's what I go by.

Basically, this is why I say anything within 8 points is in play. Because each candidate's polling value can be off by 4, and given there are two candidates with a shot at winning, you generally get 8. Some people might argue that I should look more at like a 3 margin of error, some would say I should make it even higher, I think 4 is about right. I haven't seen many polling results exceed being 8 points off, and even crazy election cycles like 2020 and 2016 tend to not exceed being off by 8 points. 

However, one thing that should be noted. With election predictions, it only counts when you're wrong one way. If you guess R+2, and you get R+12, yeah that's out of the margin of error, but R still won. It only matters if you got something like R+10 and then suddenly it goes D+1. 

Generally speaking, if a result is "tilt" or within 1 point, I can be up to 60% confident in the results. If the result is "lean" or up to 4 points, I can be up to about 84% confident in the result. If the result is 8 points, I can be 97.7%, just short of 98% confident in the results. And of course, at 12 points I'm 99.9% confident. Most political scientists use 95-96% confidence in a two tailed test as the gold standard, so that's what I consider swing states or not.

Basically, this column takes the value recorded in the margins column and divides it by 4 to give me the Z score. 

8) I then use the Z score from #7 to calculate the exact probabilities. This is now done entirely in excel. The D column is the democrat's chances of winning a certain race, and the R column is the republican's chances. 

9) I then record the number of electoral votes that each state provides so that I can calculate the cumulative electoral vote count in #10.

10) the D EVs (democratic electoral votes) and R EVs (republican electoral votes) column records the number of electoral votes won if every state that is less likely to flip than the previous one goes for the candidate in question. In this model I assume 107 safe electoral votes for democrats not listed in my predictions, and 123 for republicans. The goal for each candidate is to get 270 electoral votes. That's a win. It's possible to get 269 electoral votes, in which case both sides tie and the house of representatives decides the election. Every state at or above 270 electoral votes has the color changed in that column to indicate that the party in question is expected to win by that point. 

11) In order to actually come up with the probabilities for my electoral prediction, I need to find out the tipping point state. The tipping point state is the state that provides the easiest path to 270 for each candidate. In most electoral predictions I estimate Pennsylvania as the deciding state this year. However, in this specific one, Nevada is the deciding vote because PA got very red recently due to Biden's bad polling as of late.

12) I calculate electoral odds in general based on the tipping point state's probability to go to each candidate. Yes, it's possible in practice the model won't work exactly as predicted here. Trump might lose Pennsylvania here and win Nevada and Georgia for example. It's not outside of the realm of possibility. Still, such an outcome is less likely. The tipping point shows the path to least resistance for each candidate, and assumes that every state before it also flips for that candidate, because elections seem to happen in waves with most flips going one way, and those states are even more likely to go for the candidate in question. For example, if Biden wins Pennsylvania, I can assume that he's also going to win Michigan, Wisconsin, and probably even Georgia and Nevada in this particular snapshot. If Trump wins Virginia, I can assume all of those states probably went red, as did Nebraska CD2 and probably Maine. 

13) I look at where the rough 50/50 is mark is probability wise, and I assume that's the most likely electoral outcome. For example, in this chart, Nebraska CD2 is more likely to go Biden, but Wisconsin is more likely to go Trump, so I assume the most likely electoral outcome is 226-312 in favor of Trump. I then color and print a map accordingly based on the 1/4/8 margins as discussed above. I currently use yapms.com for my election maps.

This is basically what the above chart looks like when mapped out. Michigan is pale red due to being <1 point for Trump, NE2 is pale blue because it's <1 for Biden. And it just goes on from there with the dark red/blue being >8 point margin, >4 being the next shade lighter, >1 being the next shade, and then finally <1 for the lightest shade. And that's basically what the most likely electoral map is. 

14) Here I list the date I last updated my chart and cite the sources I get my data from. I get most of it from realclearpolling, but I do occasionally use 538 for things like weird congressional districts whose votes count distinctly from states like maine and nebraska, if not available on RCP.

I also often will be posting a graphic on my predictions that looks like this:


This is essentially the historical odds of Biden and Trump this race as defined by the tipping point discussed in #12. As we can see, Trump has been in the lead the whole time, fluctuating between around 70% and 90% this whole election cycle, with his odds really going up since the debate.

So, that's my model. I also do something similar with the senate, and you'll see those predictions on my formal predictions as well. 

Discussing what the model does and doesn't do

So, as you can tell, this is a polls heavy model. It's not intended to predict how things may change in the future, and it's going to be more volatile than say, 538's model. However, to take a dig at 538's model, I think a model that is stuck at 50/50 even when the odds for Biden are so bad is a useless model. We get it, you're trying to predict 4 months from now, but you cant do that, there's too many variables to account for, and just assuming 50/50 like they are is a useless nonprediction. The fact is, they don't know what's gonna happen. 50/50 is practically covering their butts. if I gave 50/50 in my own prediction, it's either because the polls are that close, or because I have no data and have no idea what's happening.

This prediction is based on polls and a statistical analysis of them. It calculates what the election outcome would be if it were held today. I understand that things will change a lot between now and election day. Things have changed quite a bit so far. And I will provide updates when things do change. So will 538 and maybe they'll finally start realizing this is very strongly a trump race once the data starts pouring in and they notice the polls aren't changing by september. 

I do wanna note that my election predictions have had a pretty decent record. I normally only get 2-3 states wrong, with most of them being tilts or leans. And that's okay. The probabilities above just that, they're probabilities. Due to randomness, I'm never gonna be dead on. If I had to guess based on previous election predictions what this one will do wrong, I would say, it depends on what I'm wrong about.

You gotta understand, the probabilities only give me a range of predictions. If I'm 76% certain a certain state is gonna have a certain outcome, that tells me in 76 out of 100 possible outcomes, I'll be right, but I'll be wrong in 24 of them. The higher the margin gets in one direction, the more certain I can become. And right now, i think 90% is actually fair for Trump. Overcoming a 5 point margin is not easy like this. it can happen, but it will only happen 10 out of 100 times give or take. 

When I'm only 60% or 70% certain of an outcome, I'm going to be wrong once in a while.

If I had to guess, based on previous outcomes, what I might get wrong, I'll say this.

If Biden overperforms, I'd expect him to win Wisconsin and Michigan. MAYBE Georgia, but honestly, I don't see more than +4 margins flip very often. Despite assuming a bell curve my actual prediction rates based on previous elections are ~70% for anything under 4 points, and ~96% for anything above 4 points. That could spell an issue with my methodology being too lax at times and being too willing to embrace some level of uncertainty, but I think I'd be wrong A LOT more if I adjusted my margin of error down to 3 points. Then I'd be consistently getting rust belt margins wrong more than I should, so...yeah. I'm sticking with 4. I'd rather lack confidence than be overconfident. 

If Trump overperforms, he could start taking NE2, Minnesota, Virginia, Maine, and even New Hampshire. I mean, a Trump +3 outcome where Trump overperforms by say 3 points would do that. Whereas for Biden, he'd only get Wisconsin and Michigan as of now. As I said, harder to flip the "likely" states in the 4-8 range. Not impossible, but yeah, i think 2-16% is fair there. 

But yeah. That's my model. I may adjust it in future election cycles, but I'm largely gonna stick to what I have for now, although I might eventually add extra pages for third party candidates. I havent done that because the 5 way polling has been sparse and far less reliable than 2 way polling to the point that it doesn't tell me anything I don't know and often gives me results I'd consider inaccurate due to lower amounts of polls to draw from.

No comments:

Post a Comment