Saturday, October 31, 2020

Running a model based on 10/30 data

I decided to do something fun. I know 538 has these models where they run simulations estimating the outcome of the election. They run it like 40,000 times to come up with their chances for candidate wins. I don't have the resources or know how to run a model like that, so I'm going to do something simplistic. What I'm going to do is simulate an election where I guess the outcomes of each state by random number generator. How this works, is if a state has a 87% chance of going biden, I will run a random number generator pulling numbers between 1 and 100, if I get 1-87, it I will record it as a Biden win, if it goes 88-100 I will record it as a Trump win. I will do this for all swing states in accordance with their probabilities, until I simulated the entire election. Then I will do this 20 times, just to see how it turns out. There are limitations to doing this. It ignores the trend model in which a state going one way means several others may also go that way. Each state is going to be an individual trial. This means I might get weird unrealistic results in practice, but it will still be fun to watch. I considered splitting the states into blocs but I think that would be simplistic and have its own flaws too, so I'm just going to do it randomly. Without further ado. Let's get to it.

Trial 1: Biden 280-258

This first outcome was weird. A lot of states, including states I deemed improbable to go republican, went republican. Still, Biden actually won the election. Just goes to show how hard it is for Biden to actually win.  He swept most of the board, even taking Minnesota, but as long as Biden keeps Pennsylvania, Michigan, and Wisconsin, as well as taking a couple of other states, Biden wins. I expect in a trend model much like 2016 that Trump will have a far easier job winning, but it just goes to show that treating all states separately how hard it is for Trump to actually win.

Trial 2: Biden 379-159

This one ended up being much more predictable and was a clean sweep for Biden. He even managed to take Texas! At the same time Trump held on to Georgia and Ohio which is a distinct possibility. Still, a clean Biden win here.

Trial 3: Biden 295-243

The neolibs will like this one. Biden did abysmally in the rust belt, losing Minnesota and Michigan, while the democrats took Arizona, Georgia, and North Carolina. I can hear it now. Going on and on about centrism being so wonderful and suburbanites giving the democrats the election while rust belt working class voters are a bunch of racists for voting Trump. Ugh...

Trial 4: Biden 364-174

A pretty respectable, run of the mill, Biden win.

Trial 5: Biden 327-211

 Pretty normal other than Alaska randomly going blue.

Trial 6: Biden 388-150

Another sweeping win for Biden

Trial 7: Biden 340-198

A bizarro world win for Biden where he wins Texas only to lose Arizona, Ohio, North Carolina, and Florida. 

Trial 8: Biden 291-247

A fairly conservative win where Biden loses Pennsylvania but picks up a few southern states to compensate for it. 

Trial 9: Biden 326-212

Trial 10: Biden 350-188

Trial 11: Biden 373-165

Trial 12: Biden 330-208

Trump hasn't won a single one so far...

Trial 13: Biden 329-209

Trial 14: Biden 343-195

Trial 15: Biden 297-241

Trial 16: Biden 321-217

This one started out good for Trump with him taking Pennsylvania, Florida, and North Carolina, but then he lost Ohio, Arizona, Georgia, and even Missouri. Just goes to show that even if Trump can take some of the states he needs decisively, if Biden can win a few back elsewhere, Trump can't win.

Trial 17: Biden 392-146

This one was Murphy's Law for Biden. Biden swept the board, even getting a god roll with Alaska. The only state outside of some very unlikely ones that Trump won here was Ohio. This is darned near the worst case scenario for Trump. 

Trial 18: Biden 310-228

Trial 19: Biden 354-184

Another neoliberal wet dream in which several rust belt states go red but due to a commanding win in the sun belt.

Trial 20: Biden 364-174

Another solid win for Biden.

Overall results: Biden: 20, Trump: 0

Hmm, Trump didn't win a single one. Is this because Trump really stands that little of a chance? Or is it because the random flip model is that unfair to the underdog relative to the trend model? After all, I've posited there's an 18% chance or so Trump can win if everything goes right for him. But here, even if he manages to win several rust belt state, statistics evens out and Biden wins elsewhere. What gives?

To answer this question, I think I'll try this again, but using 2016 data with Clinton vs Trump. Let's see how it turns out.

2016 simulations

Trial 1: Clinton 275-263

Hmm, one thing I notice right away is that Clinton starts off with 176 electoral votes to Trump's 155. This time around Trump starts with 100 to Biden's 216. So Biden starts out way ahead of Trump, whereas Clinton was far more evenly matched with Trump. In practice, this meant a much more modest win for Clinton, much in line with my 56% chance on election day where Clinton was expected to win 272 electoral votes. Here, Clinton kept the rust belt, won a few southern states, but also lost some states too. It's much more narrow than all of Biden's wins. It IS harder for Trump to win this time than in 2016. Let's see how the rest of the simulations go.

Trial 2: Clinton 309-229

A slightly more solid Clinton win.

Trial 3: Trump 282-256

Trump wins narrowly due to taking Michigan. Essentially this was fairly close to the 2016 actual and it shows the inherent weakness in the blue wall for Clinton.

Trial 4: Clinton 282-256

2016 seems a lot more consistent. While a lot more states seemed to be in play, Clinton did still have a large advantage in a lot of them, and seemed to hold a statistical advantage. However, as we can see, her wins are narrow. I'm playing Russian Roulette with a lot of these rust belt states, and she keeps winning, but as we can see, if she loses even one more good state in a lot of these simulations, she loses the election. We don't see that in 2020 with Biden, where he wins in a blowout almost every time. Clinton's average win so far is Biden's worst, with many things going wrong.

Trial 5: Trump 278-260

Trump wins again! If he manages to get a hold of a few rust belt states and do fairly decently elsewhere he wins the election, which is what happened in 2016. We didn't see this in 2020 at all.

Trial 6: Clinton 302-236

A decent Clinton win. She lost Michigan but picked up North Carolina and Florida to compensate.

Trial 7: Clinton 312-226

Trial 8: Trump 288-250

While there haven't been any simulations with Trump winning Wisconsin, Michigan, and Pennsylvania, this simulation model is showing that there is a distinct weakness in the Rust belt where in a minority of the simulations, Trump wins at least one of those states, and the election. 

Trial 9: Clinton 283-255

Clinton wins again. She lost Pennsylvania but picked up Florida to compensate. While that wouldn't help in the real scenario as three rust belt states went red, in simulation land it is possible.

Trial 10: Clinton 298-240

Trump picked up PA, but lost Arizona, North Carolina, and Florida. 

Overall results: Clinton: 7, Trump: 3

There you have it folks. In 2016, this simulation model did predict a crack in the rust belt and the possibility of a Trump win. Here, Trump won 3 of the 10 simulations, which is in line with Nate Silver's 70% Clinton prediction. Clinton was still favored, but if she happened to do bad on the coin flip states and Trump managed to flip a rust belt state or two, it was very possible for Trump to win. And that's what happened. It should be noted none of my simulations expected Trump to win as much as he did. He did not turn Wisconsin at all, as that state was highly improbable to flip. And I can't recall Trump winning both Michigan and Pennsylvania in this simulation. This is what happens when you treat every state separately instead of in trends. This brings the outcomes toward the mean, with extreme outcomes not occurring in practice. If a candidate gets lucky in a state or two, statistically, the other candidate will make it up elsewhere. That said, a weakness in this model is that due to the states being interconnected in real life, more extreme outcomes are possible, and as such, Trump winning might be more likely than indicated here. Regardless, the data is quite clear. The rust belt victory strategy was predicted here and remained a distinct possibility, whereas in 2020, every simulation was a win for Biden. Trump literally would need a statistically unlikely "murphy's law" scenario for Biden in order to actually pull it off this time. 

2012 simulations

Let's do some 2012 simulations just for fun.

2012 was....weird. I crunched the data real quick but won't post the chart. Here's an electoral map based on RCP's projections to give you an idea.

Obama 303-235

This is much narrower than I remember, and the deciding state ultimately would've come down to Nevada which was only 2.8% for Obama...meaning that Obama only had about a 76% chance of winning. Meanwhile I gave Clinton a 56% chance in 2016 and Biden is currently at an 82% chance. Fun fact. The data was dead on, except for Florida which went for Obama instead of Romney. So with that election there wasn't a lot of diversion from the data vs 2016. I remember 2008 was like that too but can't find their data any more. Another thing I notice is that the electoral college, much like 2016, feels a lot more even. Obama started out with 184 to Romney's 180 here. However, most swing states did lean Obama, meaning he had the advantage. Let's see how it plays out.

Trial 1: Obama 326-212

Trial 2: Obama 300-238

Even with Romney stealing a few states Obama just held up too well here.

Trial 3: Obama 302-236

Trial 4: Obama 312-226

Trial 5: Obama 298-240

Trial 6: Obama 271-267

Yikes, Obama ALMOST lost that one

Trial 7: Obama 286-252

Trial 8: Obama 286-252

Trial 9: Obama 272-266

Another near loss for Obama

Trial 10: Romney 276-262

Murphy's law for Obama 

Overall results: Obama: 9, Romney: 1

Hmm, you know what? I'm seeing a trend here. A bunch of random outcomes that are worse than the actual outcome. You see, in 2012, the data got it almost right, outside of Florida. What happened? Well, the trend model won out. Obama won all of the states on the little 538-esque "snake" and then overperformed, causing him to win Florida. This is similar to what happened to Trump in 2016. Trump got all the states he was supposed to, but the overperformed. And despite my 2016 simulations predicting rust belt weakness, it didn't see three rust belt states falling in tandem. That said, this simulation exercise may not be as effective at predicting results as I thought. I mean, it is, in the sense that it accurately predicted 2012's winner, and also predicted in 2016 that there was a possibility of a rust belt victory for Trump, but it's limited in its application in practice. The fact is, following the data and the trend model seems to do a much better job of showing what's going on overall, and then if you expect an overperformance either way, you get an idea of what COULD happen. In 2012, an overperformance for Obama won him Florida, giving him 332 electoral votes. In 2016, an overperformance caused Trump to win the rust belt. While some of the states won and lost seemed a bit more random for Trump, they overall follow a trend model more than the randomness I'm attempting to follow here. That said, this model may not be able to accurately predict the 2020 election.

However, I will say this. This model did, to a certain degree of accuracy, give us insight into what COULD happen in various elections. In 2012, we saw a boring, stable election where Obama held a fairly narrow lead over Romney across a wide swath of states, and the data predicted what would happen almost perfectly. We also saw what could have happened had the trends went the opposite way, where Obama's lead would have narrowed and Romney could have even narrowly won. In 2016, we saw a few Trump wins with him taking Pennsylvania or Michigan, it can happen. However, here, I see absolutely no chance of a 2020 win for Trump, Biden's got this. It's more solid than Obama's 2012 win, which as far as I'm concerned is a slam dunk.

What really matters is that the data is accurate and the actual trend is fairly close. If Trump overperforms by a point or two, we could see a narrow win for Biden. If Trump sees more like a 3-4 point overperformance, Biden could be in hot water. On the flip side, a 1-2 point overperformance for Biden won't mean much, but a 3-4 point one will mean taking Texas. We'll just have to see.

All things considered though this exercise makes me fairly confident in a Biden win. He's way above where Obama was in 2012, and where Clinton was in 2016. Obama's win in 2012 was decisive in practice, and Trump's win in 2016 was narrow to the point of being almost a statistical fluke. It could happen. I had a 30% success rate of predicting a Trump win in 2016 here, but it wasn't a sure thing. 2020 seems to be a sure thing for Biden by comparison.

https://www.270towin.com/maps/Zm39o

No comments:

Post a Comment