Wednesday, June 21, 2023

A brief discussion of the methodology behind my political forecasts

 So, I thought I had an article about this, but I guess I don't. I had a friend who sometimes reads my blog ask me about this, and wanted to share some articles on it, but I never had any articles actually outlining how exactly I do my forecasts, so I figured I would give a basic rundown.

Basically, my forecasts are deduced by polling averages. I look up the polling average on realclearpolitics.com, and sort by the polls I'm interested in. Sometimes if RCP does a bad job I'll use 538's data, which uses much of the same data set, but I typically try to stick to RCP for consistency if possible unless they drop the ball. 

From there, I assume a margin of error of 4% and calculate the probabilities on a one tailed statistical bell curve. Why do I use 4% as my margin of error? Well, ideally political polls aim for 3%, but a lot of polls in practice never meet such a tight standard. 3% MOE means a sample size of about 1000, while many polls are closer to the ~500 range. More people polled is more representative of the population and generally speaking error goes down with the sample size. Generally speaking, on a statistical bell curve, roughly 96% of all results will occur within two standard deviations of polling average (standard deviation is interchangeable with margin of error here). With a MOE of 3%, that means 96% of results are within 6 points, whereas with an MOE of 4, which is a bit more lenient and in line with most polling (which ranges between 3-5, with 4 being a nice estimate of an average), 96% of results will occur within 8 points of the polling average. 

Now, in practice, we don't need to worry about a two tailed test, it only matters if results fall outside of those two standard deviations one way. If a candidate who is already winning overperforms by more than 8 points, they still win, even if the result is atypical. So in reality, we only have a 2% chance of the wrong outcome here.

From there, I look at the polling average, see how much a candidate is leading, and then convert their lead into a Z score (generally speaking the number of standard deviations away from 0%) and get a probability of winning or losing from that

Generally speaking, a 0% leads means a Z score of 0, which means the outcome is 50/50.

A 1% lead means a Z score of 0.25, which means the outcome is 60/40 roughly. I consider anything less than this to be a very close race that could go either way. Things come down to a coin flip more or less with these and we really don't know what direction they're going until the results come in.

A 4% lead means a Z score of 1, which means the outcome is about 84/16. Generally speaking, results ranging from 0.25-1 Z score are in play, but leaning in favor of the person in the lead. Still, there can be anything from a 16% (roughly 1 in 6) to a 40% (roughly 2 in 5) chance of the result flipping, and we have seen quite a few of these. Pennsylvania was in this category in 2016. Clinton was 1.9% ahead, meaning a Z score of 0.48, which translated to around a 68% chance of Clinton winning, or roughly 2 in 3. But....the other result came in. It happens. Michigan was another one, with a 3.4% lead, which translated to a 0.85 Z score, with translated to an 80% chance (4 out of 5) that clinton would win. The other result came in, it happens.  In 2020, Florida was a lot like this. Biden had a 2.8% lead, leading to a 0.7 Z score, which led to a 76% chance (3 out of 4) that Biden would win. Trump flipped it, although other states like Arizona and Nevada went for Biden. 

An 8% lead means a Z score of 2, which means the outcome is about 98/2. Generally speaking, results ranging from a 1-2 Z score are technically in play, but the probability is rather low. It can still happen. They range from about 16% (1 in 6) to around 2.3% (1 in 40). In political science, anything greater than a 95% confidence level in a two tailed bell curve (or 97.5% in a one tailed curve) is considered statistically significant. In a social science context this means that the result is so unlikely to have occurred by chance that it's not considered. It CAN still happen, but for the sake of my political forecasts, it's only going to happen once in every 40 elections, or once every 160 years in terms of presidential cycles. So once or twice in American history. I generally exclude states with more than an 8 point lead in either candidate's direction unless there's another reason to cover it.

Now, in practice, I notice that political cycles happen in waves. In other words, when one result is off, the rest of them are likely off too. If trump overperforms in one place, he's going to likely overperform in places with similar polling numbers. 

For example, in 2016, there were 5 states that were within one point, including New Hampshire, which was the deciding state in my model, where whoever won that, assuming my model was followed exactly, would win the presidency. Clinton had a 56% chance of winning it, so I gave her a 56% chance of winning the presidency, much lower than most other people. I ended up flipping all of the other ones to trump as the exit polls came in on election day. I figured it would come down to those five states, and Trump did win 4 of the 5, but New Hampshire went in favor of Clinton. But then a few OTHER states which had lower probabilities of flipping went for Trump. Pennsylvania, Michigan, and Wisconsin all flipped. PA and MI were both in the "60-84%" zone, so not impossible to flip, but it happened. And that cost Hillary the election itself given tight margins. But then Wisconsin also flipped and that had a 95% chance of going Clinton. That was A LOT less likely, but given the similarities between the three states electorally, it did make sense that if one flipped the others were at risk of flipping. In general, PA, WI, and MI all followed the same red shift Ohio took that year, going from the bellweather state in the 2000s, to being more reliably red in the 2016 and 2020 elections. Even Minnesota, which is more reliable for democrats, followed the same trend. So something was up with the rust belt in general in 2016, and in 2020 similar trends followed. I personally think it's the crappy post industrial economy combined with social conservatism in terms of immigration and the like. 

But yeah. Trump overperformed here. The rust belt swung harder for him than it should have otherwise, it should have been a moment for the democrats to look in the mirror and reflect on what they were doing wrong. In my opinion they haven't done a good job in getting to the root causes and correcting for those, so expect that region to be politically volatile election wise for the forseeable future. It's always been kind of swingy, but in a way that was generally favorable toward democrats. Now it's more in play, and at risk of going to the republicans. 

Contrast this with 2020. Here, Biden was in a MUCH better position than CLinton was in 2016. Pennsylvania was the deciding state in my model, and it was up 4.8% for biden, which translates to a 1.2 Z score, which translates to an 89% chance of a win. So for 2020, I estimated an 89% chance here. 

But then as the results came in on election day, things started looking dicey. I had three states in the swing category, North Carolina, Ohio, and Georgia, and two of them went for trump that night. I had 5 in the semi flippable category between 1-4% and Florida also went for Trump pretty early. The others were on the table and took days to figure out. Georgia, Arizona, Nevada, and Pennsylvania, which was a semi firm blue state this time around, were all so close that it took a while for them to be figured out. In the end, they did stand for Biden, and Biden won the election, but there was a moment where the country was collectively staring into the abyss wondering "did we really just elect this psychopath again?" 

And the results in the rust belt did follow 2016. Results in the south and southwest like Georgia, Arizona, and Nevada did hold firm for biden, performing roughly where they were expected to, but in the rust belt it was a different story. Pennsylvania only narrowly went for Biden, as did states like Michigan and Wisconsin, which were also A LOT closer than they should have been. Seriously, I had them up by over 8, but kept including them because of their status as flippers that defied the odds in 2016. And those states ALMOST flipped again. The rust belt, at least the more blue leaning parts of it, did end up turning out for the democrats, but those polling leads were misleading. Those 5-9 point leads very quickly turned into something like a 1% lead in practice, so the numbers were off up there just as in 2016, it just so happened that the democrats had such a nominal lead that Trump couldnt' overcome the sheer statistic advantage that the democrats had even with him overperforming. 

So, again. In 2024, keep an eye on the rust belt. Pay particular attention to Pennsylvania, Michigan, and Wisconsin. 2020 showed for me that 2016 was NOT a fluke and those areas are extremely volatile poll wise and could flip hard for the GOP if the democrats aren't careful. Honestly, I think this could be avoided if the democrats had a more progressive economic platform and toned down the identity politics somewhat, but democrats don't seem to care. They seem more interested in capitalizing on trying to turn Georgia, Arizona, and even states like Texas and North Carolina blue. Admittedly they are making inroads in Georgia and Arizona, but Texas and North Carolina seem to be remaining reliably red for the GOP, with states like Florida and Ohio, which were once the swingiest of the swing states becoming more reliably red as well. 

I don't know how 2024 is going to turn out. it's still early. I will admit the polls are abysmal for democrats right now, but they were for the mid terms and the democrats outperformed their 2020 numbers in the 2022 mid terms so who knows. There's so many factors going every which way that who knows what will actually happen. I'll start following that more closely next year. 

But yeah, this should give you at least a basic idea of how I do election forecasts. I don't know whether the person that I made this for will be able to use it for their own interests, but it is nice to discuss methodologies.

No comments:

Post a Comment