Sunday, October 5, 2008

Poll Position

Dear Doctor Math,
I saw a poll today that said Obama is up 7 percentage points in Ohio with a margin of error of 4%. So does that mean he could actually be losing there? Also, how can they come up with these numbers just by asking a few hundred people?
A Concerned Citizen

Those are good questions, ACC, and they're related. To answer them, I should talk a little about how polling works and what the various numbers mean. First off, polling is an imperfect attempt at predicting the future. No one knows for sure what's going to happen on election day, and sometimes (see Florida in 2000) it's hard to figure out what did happen even after the election. But polls are our best guess, and usually they do a pretty good job.

To conduct a poll, a news agency like CBS or Reuters or a public opinion firm like Rasmussen gets a staff of questioners to each call a handful of people and ask them their opinion on things, like how they're planning to vote. Since it takes time and money to make the calls, the pollsters typically limit themselves to something like a few thousand people. Of course, a lot of people (like young people who don't have landlines and probably vote Democratic, but never mind...) don't answer, so by the time the pollsters compile all their data together, they've got maybe 1000 quality responses to go on. From here they try to figure out what the remaining millions of people in the state or country are thinking, and then they report that information, thereby influencing the way people think, but that's another story.

So, the first question is, how do they know they didn't just ask all the wrong people? And the answer is they don't know for sure, of course, but if their methods are sound they can say with a reasonable degree of certainty that their polling numbers reflect the larger population. Think of Mario Batali tasting a single spoonful out of a pot of marinara sauce to see if it needs more oregano. Of course, it's possible he just got the most oreganoed spoonful in the whole pot, but if he's done a good job of stirring it up beforehand, he can be reasonably sure that his sample was representative of the distribution of the whole. But it would still be embarrassing for him to be wrong, so it would be nice to at least have some idea how much of a risk he was taking or maybe if he should taste it again.

That's where the "margin of error" comes in. The error that pollsters give is an indication of how sure they are that the sample they chose is a reasonable reflection of the population at large. For reasons I hope to get into someday (involving The Central Limit Theorem), the pollsters assume that the "true" value of the thing they're estimating follows a bell-shaped curve centered around their estimate. So, if they're trying to figure out how many people in Ohio are going to vote for Obama, they take the results from their poll (49% in the latest Columbus Dispatch poll) and say that the actual percentage of people planning to vote for Obama has a probability distribution forming a bell-curve centered around 49%. That means they can actually quantify the probability that their estimate is off by any given amount. The margin of error is the amount of deviation it takes before the pollsters can say with 95% probability that the true value is within that much of the estimate. They pick 95% mostly out of convention and the fact that it's easy to compute. Here, the key factor is the number of respondents--a rough formula for the margin of error (at the 95% level) for a sample of N people is , which for 1000 people comes out to be about 0.03, or 3%. They might occasionally bump it up to 4% just to be extra sure.

Now, to answer your question, does this mean that McCain could actually be ahead? After all, the 7 point difference is less than twice the margin of error, so if we add that much to McCain and take it away from Obama, it does put McCain on top. It's possible, as I mentioned above, that the pollsters just asked enough of the wrong people to skew the numbers. In fact, if you've been paying close attention to what I said about the true value having a bell-curve distribution, you might have noticed that actually the situation is in some sense worse than just that. That 4% number is just the cutoff for the 95% confidence interval; it could be (with about 5% probability) that the poll is off by even more than 4%. Should we just throw up our hands and quit?

The important thing to remember here is that the margin of error isn't the end of the story. The bell-shaped curve which gave us the error calculation also shows us that it's more likely than not that our estimate is close to the truth. So again, we can quantify the degree of certainty that we have in estimating the difference between Obama's percentage and McCain's. Using a formula (that I admit I had to look up), we can compute that the standard error, a measure of how spread out the distribution is, in that estimation of the 7% Obama-McCain gap in Ohio is about 0.03. That means the bell-curve is pretty narrowly distributed around the guess. According to the bell-curve distribution, this gives a probability of about 99% that Obama is "truly" ahead in Ohio.

So, yes, even if those polling numbers are correct, it is possible that McCain's ahead, but I wouldn't bet on it (unless someone was giving me greater than 100-to-1 odds).

-DrM

No comments: