Thursday, November 21, 2024

An objective look at my 2024 election model and how accurate it was

 So, while there are still a few very close house races up for grabs, especially in a certain west coast state that doesn't seem to know how to count ballots in a timely manner, the 2024 election is now mostly a settled issue, and I feel like it's time to evaluate how my model did. 

Presidential

On the presidential prediction, my model got the right outcome. It predicted Trump would win, although it was only 54% sure. It got 48 states correct, with 2 being incorrect. The ones that were incorrect were both off by less than a point, and were themselves tossups. I underestimated Republicans by 25 electoral votes, which wasn't that crazy all things considered.

To break down how things worked:

There were 4 tilt states, and I got 2 of them wrong (50% success rate). I would expect a success rate of 50-60% here, so this is on par.

There were 4 lean states. I got all 4 correct (100% success rate). I would estimate a success rate of 60-84%, which would translate into one being wrong, so I overperformed here.

I guessed 9 likely states. I got all 9 correct (100% success rate). I would estimate probability between 84-98%, so once again, I overperformed here. I would estimate normally only getting 1 wrong here. 

I also included 8 solid states, which each have a 98%+ probability of being right. They were all right. So once again, 100% success rate. 

All in all, my election model performed on par. Despite the probabilities, in presidential forecasts, I seem to overperform. Still, my model isn't 100% right, but yeah, it performed on par with expectations, showing that polls, once again, are a better indicator than "keys" (yes, taking another swipe at Lichtman, no, Biden wouldnt have won). 

Senate 

On the senate projection, I was also fairly accurate. I only got one state wrong, Pennsylvania, and the final result is a dead heat. Some outlets are calling it for McCormick, but there has been a bitter legal battle with recounts underway for it. I don't expect democrats to prevail here. It looks like McCormick won, so I'm going to stick with that for this evalulation.

Here, there were no true tossups, so 0/0 there. 

I guessed 5 leans, and got 4 correct. That's 80%, I would expect a 60-84% success rate so that's on par.

I guessed 5 likelies, and got all 5 correct. That's 100%. I would expect a 84-98% success rate so I would expect 0-1 wrong, so that's on par. 

I guessed 4 solids, and all 4 were correct, why am I not surprised? That's 100%. 

All in all, I was off with the outcome by 1. I estimated 48-52, and got 47-53, because of the PA dem loss. That's about as good as I would expect with a probabilistic model. 

House

 

So, before I begin, I want to emphasize the fact that this house forecast was experimental. I did not have a lot of data on things, I did not know what races to actually keep track of to some extent, and I had so little data that I ended up having to use cook PVI combined with the generic congressional vote to actually give an estimate here. 

Despite this, the model seemed to do a good job. Currently, the house is at 213-219 Republican. Three races are still uncalled: IA1, CA13, and CA45. All three are coming down to the wire and are literally like 49.9% vs 50.1% or something. As of now, IA1 and CA13 are trending republican, although some think CA13 is likely to flip democratic. CA45 flipped democratic and I'd expect them to win there. For the sake of discussion, let's assume that IA1 and CA13 go R and CA45 goes blue. 

This would mean that we get a 214-221 republican led house. I estimated a 218-217 democratic led one. 

EDIT: CA13 flipped to the democrats, giving us a final outcome of 215-220.

As such, I'm only going to be off by 3 seats in net. This is pretty good. I guessed a narrow outcome, and we got a narrow outcome. This is far better than my 2022 estimate, where I guessed we would get 190-245 or so. So...the model worked...mostly.

However, I do want to evaluate things on an individual basis so I do want to account for things there.

First, I made some clerical errors. I somehow made NY11, a safe red district, go blue. But then in my map, I made another NY district go red when it should have gone blue. The two errors evened each other out there and I calculated my safe seats based on the outcome I got. 

In my actual prediction, of the 50 races I covered, I got 12 wrong. That means that my success rate was about 76%, much lower than my overall model would suggest, and much lower than my model got for the house and the senate. Looking at the ones I got wrong, most of them were based on polling data, with me getting the GCV+Cook PVI ones correct more of the time. 8 of my errors were from states with polling, and 4 were with GCV+Cook PVI. 28 races were estimated with polling, so that give me a 71% success rate with states with polling, and a 81% success rate with those with GCV+Cook PVI. The "fundamentals" model actually did better here.

Why is this? Well, to be fair, polling for house districts was a lot more scarce. I had less of a polling average to fall back on, and often only one single poll. A single poll can be off by high margins, probably more in line with my normal probability scale system. If anything, the reason my predictions overperform expected probabilities is because several polls lead to an average converging on something closer to the truth, while an individual poll can be significantly off, as far off as 6-10 points depending on the MOE. And that's just for a 95% confidence interval. As such, let's see how this breaks down overall.

Of the 50 races I covered, 8 were tossups. I got 2 wrong. So thats a 75% success rate. I would expect a 50-60% success rate so that'sa significant overperformance.

22 races were leans. I got 6 wrong, meaning I got a success rate of 73%. I would expect a rate of 60-84% so this is about on par with my model. 

8 were likelies. I got 3 wrong, so that's a 63% success rate. I would expect 84-98%, so I definitely underperformed here. 

12 races were solids. I got 1 wrong. That's a 92% success rate. I would expect a <2% error rate. So I underperformed. I guess if you count the fact that theres like 385 more races and I got at least 384 of them right (ignoring NY11), my success rate is more in line with expectations, but of the ones predicted, yeah I did get 1 wrong.

All in all, was my model good enough? Yeah. Good enough. I got a close enough net outcome that I'm happy with it. I would say on the whole my model underperformed expectations, but given how little data I had to work with, it seemed to do a "good enough" job. I was only off 3 seats in net, and individually, I would say my model slightly underperformed statistical expectations, so definitely not as strong of a showing as my presidential and senatorial models that have far more polling data to work with, but it held up surprisingly well given the circumstances and I can't say I'm displeased with its performance. My simulator definitely would have been able to predict the net outcome reasonably enough, it was well within probability, although getting the actual seats to line up with be a much more difficult task. Still, if you ran it a million times in theory I'd guess you'd eventually get the right outcome.

Governor races

With the governors races, I only did a map. The fact was, I wasn't very interested in governors races and there were only like 1-3 worth keeping an eye on anyway.

Those being New Hampshire, North Carolina, and maybe Indiana. And yeah, I got all the outcomes right.

I'm only gonna count the one lean here, which was New Hampshire. So 1 for 1, 100%. Yeah. 

Total success rate (2024)

So, of the 17 tossups I guessed, I got 10 right. That's a success rate of 59%. I expected a success rate of 50-60%, so that's about on par.

Of the 32 lean races I guessed, I got 25 right, that's a success rate of 78%. I would expect 60-84% so that's about on par. 

Of the 22 likely races I guessed, I got 19 right, so that's a success rate of 86%. I would expect anywhere between 84 and 98%, so that's within the expected range. 

Of the 24 solid races I guessed, i got 23 right, so that's a success rate of 96%. I would expect an error rate of 2% or less, so I slightly underperformed here. However, I mostly only covered races up to about a 12 point threshold, with the exception of some congressional races which RCP deemed important enough to consider tossups. If we included all 8+ margin states, my model would be performing better.

Overall though, my model performed as expected, and if not for my experimental house forecast, I would have overperformed all metrics. Because of the house forecast, I performed about on par. 

Total success rate (historic)

Adding these to my historic predictions, we get a larger sample size:

We're now at 43 tossups, with 28 correct guesses, that's 65%, and a slight overperformance from the 50-60% I would estimate. 

We're up to 85 leans, with me getting 64 right, a 75% success rate, and on par with the 60-84% range I'd guess.

We're up to 72 likely races, with me getting 67 right, that's a 93% success rate, which is on par with the 84-98% range that I'd guess. 

I won't bother counting solids, since I dont even count all solids in my forecasts, or have counted them consistently in the past, but I can't recall getting any wrong, and the one I got wrong this year is a first. It's bound to happen. 

We're at 23/24 there from this year alone, indicating a 96% success right, or a slight underperformance, but if we did count previous races more, that number would even out to the projected range.

As such, my model is performing as expected. This year it performed quite well in presidential and senate forecasts, but the result was a bit more underwhelming for the house forecast. Still, it all seems to even out to being within or just outside of my expected success rates. All in all, the state of my model is strong, and I will continue to use it as is in future elections. I may try to improve the simulator between now and future elections maybe to better account for the wave aspect of politics and how we get over/underperformances in a systematic fashion, but as for the core model itself, I will continue to use it well into the future until it is demonstrated to no longer be accurate.

EDIT: Statistics updated to reflect CA13 going democratic after all.

No comments:

Post a Comment