I was planning on doing this during my usual post tomorrow, but I decided to do it now, since it was getting a little long.
The first major change: I’m not longer using the pollster.com average and am instead calculating my own. I’m using. Polls that I use must satisfy three conditions:
- It must be a phone-conducted poll
- It must have at least 400 respondents
- It must have been conducted in 2008
Of course, the last one is pretty much moot except when going back and inputting old polls. I’m not sure how pollster.com worked, in my my calculations all polls count to some extent. However, polls are weighted based on how long ago that poll was taken since the most recent poll in that state.
This has two side effects. The first is that if the most recent poll in a state was conducted on May 12th, then the “age” of all polls in that state are from May 12th, not the current date. The second effect is that, since polls lose impact as they age, even though all polls are counted towards the average, very old polls may count so little that it doesn’t actually impact the average.
The most recent poll in a state is given a weight of 1, meaning that it is counted at 100%. After that, the poll is added to the average based on it’s weight. Polls are roughly weighted as such:
- As I said, the most recent poll is weighted at x1
- A poll 3 days older is weighted at about x0.92
- A poll 1 week older is weighted at about x0.67 (or 2/3 the weight of the most recent poll)
- A poll 10 days older is weighted at about x0.5 (or 1/2 the weight)
- A poll 2 weeks older is weighted at about x0.33 (or 1/3 the weight)
- A poll 1 month older is weighted at about x0.1
- A poll about 100 days older is weighted at about x0.01
I then divide the sum of the weighted poll totals by the sum of the weights and I get the poll average.
The date of poll is determined by the last day that polling was done for the poll, not the day the poll is released. So if a poll is conducted from June 15th to June 18th, the date of that poll is set at June 18th. Obviously, if multiple polls are released on the same day, they have the same weight.
The change in methodology also resulted in a mix bag as to which candidate the changes favored. Obama had 29 of his Electoral votes drop a category (New Jersey from Strong to Weak, Montana from Weak to Lean, Indiana from Lean Obama to Lean McCain). It should be noted that New Jersey also had a closer than average poll released, which could account for the drop as well.
On the other hand, Obama had 12 EVs increase as well (Iowa and New Mexico, both from lean to weak).
McCain had 10 Electoral votes drop (Arizona). North Dakota also dropped, but that is clearly because of the new poll. McCain had 6 EVs rise a level (Arkansas).
As for whether I use leaners or not, I use whatever pollster.com reports, which I believe is generally without leaners.
THE Math
While Karl Rove may thing he has the math, I have the absolute real math.
The average of a candidate’s percentage in any given state is calculated thus (I hope I’m remembering how to do this right):
Z = Σ(RpWp) / ΣWp
Or, in other words, I take the summation of the results of Poll p times the Weight of Poll p and dividie it by the summation of the weight of all Polls p.
How do I figure out the weight for any given poll? Here’s how:
Wp = 1 / (((ΔD)2 / 100) + 1)
Or, in other words, it’s the age of the poll squared, divided by 100, plus 1 divided into 1.
Don’t ask me why this equation works. I mostly got it by trial and error, but it gives the effect of a gentler slope for the most recent polls, then a steeper slope as polls get older, then it levels out again when polls get really old.
As for the ΔD, it’s simply the age of the poll.
Isn’t math fun?
[...] Methodology [...]
[...] Methodology [...]
[...] Methodology [...]
[...] Methodology [...]
[...] Methodology [...]
[...] Methodology [...]
[...] Methodology [...]
[...] Methodology [...]
[...] Methodology [...]
[...] Methodology [...]