Ask Doctor Math: central limit theorem

I've gotten a lot of questions all related to the idea of averages, so I'm going to devote the next 3 posts to discussing different facets of averaging. It'll be like a trilogy, but hopefully one more like Lord of the Rings than Jurassic Park. Stay tuned to find out!

Dear Dr. Math,
Is there a difference in roulette between betting on black versus betting on a number, like 6? What about betting both at the same time? I usually have more fun betting on black, because my guess is that by betting on black I lose money more slowly, but I'm not sure which is actually better.
Sean

Dear Sean,

My first rule of gambling is don't gamble. (The second rule of gamb.... OK, you get the idea.) But I wouldn't be much of an advice columnist if I told you to do something just because I said so, without helping you understand why. So, maybe after we dissect this gambling question we can talk about why gambling is generally a bad idea. Then we'll talk about why you should sit up straight and why you didn't call me on my birthday.

OK, first: why the bets are the fundamentally the same, and second: why they're different.

The most rudimentary way of analyzing the quality of a bet is computing what's called its expected value. This is the number you get by multiplying each possible payoff by its probability and adding them all together. In roulette, and most other casino games, both the payoffs and the probabilities are well-known, so the expected value is easy to compute. As a convention, we always compute the expected value for a bet of $1 (my kind of bet), but for you high-rollers who bet $n, you can just multiply the end result by n. In the first example, your bet on black, there are 18 ways to win (the 18 black pockets) and 20 ways to lose (the 18 red pockets and the 2 green ones), and we're assuming that every pocket is equally likely, so the probability of winning is This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

. If you win, you get $2--your original $1 back plus another one from the house. If you lose, which happens with probability This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

, you get nothing but my condolences, which have no cash value. So altogether your expected value is This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

, or $0.946. Note that the This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

wasn't really doing anything in that calculation, so for simplicity we can skip that part from now on.

Now, the bet on 6 has a lower probability of winning but a higher payoff, and as we'll see, the two effects cancel each other out exactly. The payoff for winning a bet on 6 is $36, including your $1 plus $35 of the house's, and the probability of winning is This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

, since there's only the one 6 on the wheel, as in life. Hence, the expected value of the bet is This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

again. Those clever French guys in the 18th century managed to design the game of roulette so that almost every bet has that same expected value of $0.946, or 94.6¢. Interestingly enough, in America, there is actually one bet which is worse--the "5 number" bet on 0, 00, 1, 2, and 3, which has a payoff of $7 and a probability of This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

, for an expected value of This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

--but since they don't use a 00 in other places, we Americans have that unique opportunity of actually making a worse decision when playing roulette than just playing roulette in the first place.

The reason the expected value matters so much has to do with something I'm sure I'll be talking a lot about in future entries, The Law of Large Numbers, sometimes mistakenly called the "Law of Averages." Essentially (and pay attention to the words I emphasize for clues about how people misuse it), the LLN says that if you keep making the same bet over and over again, in the long run, the total payoff divided by the total number of bets will converge to the expected value of the bet. So, both your bet on black and your bet on 6 will pay you off $0.946 per bet on average, assuming you have enough chips to hang around and keep betting. Notice that this is a bad thing for you and a good thing for the house, because the expected value is less than the price to play the game, $1. I remember being surprised when I went to Las Vegas that so many casinos advertise things like "99% payoffs guaranteed," meaning they're guaranteeing that you lose money?

Also, it's worth mentioning that you can't improve the situation by sneakily combining multiple bets or betting different amounts or any of the other so-called "systems". Expected value has the property that you can compute the expected values of each part of an overlapping bet, like 6 and black, separately and then just add them together. In the end, a bet of $1 bet gives you back an average of $0.946 until you simply run out of money and go home.

The difference, then, between the two kinds of bets has to do with something called their variance, or its square root which goes by the name standard deviation. I'll spare you the formulas (for now), but essentially variance is a measurement of how "spread out" the payoffs of a bet are. So, among all the bets with the same expected value, the bet with the lowest possible variance, which is 0, is the bet where you just hand over your money. The variance gets higher as the payoff gets higher (and the probability of winning gets lower, in order to keep the expected value the same). In the examples above, the bet on black has a variance of about 1.0 and the bet on 6 has a variance of about 33.2, which is substantially larger.

What's going on kind of behind-the-curtain is that over time, if you keep placing the same bet over and over, the distribution of your accumulated winnings takes the shape of a bell curve (by the Central Limit Theorem). The mean of that curve is determined by the expected value of each bet (on account of the LLN); how spread out it is depends on the variance. And if you don't want a lot of risk (of losing or winning), you should try to reduce that spread as much as possible. The casino, for example, would prefer that you just hand over the 5.26 cents you were going to lose on average and repeat. Since that wouldn't be much fun for you, they offer a little variance to keep you entertained. But as you rightly point out, too much variance isn't fun either, because you don't get to "win" very often, and so you might not come back to play again. So it's a delicate balance. Personally, I like the strategy of betting the table minimum on black and red simultaneously and drinking as many free cocktails as possible while my chip stack gradually diminishes. But these are personal decisions.

Bottom line: you can't beat the house, all you can do is maybe make it take a little longer for them to beat you.

-DrM

Dear Doctor Math,
I saw a poll today that said Obama is up 7 percentage points in Ohio with a margin of error of 4%. So does that mean he could actually be losing there? Also, how can they come up with these numbers just by asking a few hundred people?
A Concerned Citizen

Those are good questions, ACC, and they're related. To answer them, I should talk a little about how polling works and what the various numbers mean. First off, polling is an imperfect attempt at predicting the future. No one knows for sure what's going to happen on election day, and sometimes (see Florida in 2000) it's hard to figure out what did happen even after the election. But polls are our best guess, and usually they do a pretty good job.

To conduct a poll, a news agency like CBS or Reuters or a public opinion firm like Rasmussen gets a staff of questioners to each call a handful of people and ask them their opinion on things, like how they're planning to vote. Since it takes time and money to make the calls, the pollsters typically limit themselves to something like a few thousand people. Of course, a lot of people (like young people who don't have landlines and probably vote Democratic, but never mind...) don't answer, so by the time the pollsters compile all their data together, they've got maybe 1000 quality responses to go on. From here they try to figure out what the remaining millions of people in the state or country are thinking, and then they report that information, thereby influencing the way people think, but that's another story.

So, the first question is, how do they know they didn't just ask all the wrong people? And the answer is they don't know for sure, of course, but if their methods are sound they can say with a reasonable degree of certainty that their polling numbers reflect the larger population. Think of Mario Batali tasting a single spoonful out of a pot of marinara sauce to see if it needs more oregano. Of course, it's possible he just got the most oreganoed spoonful in the whole pot, but if he's done a good job of stirring it up beforehand, he can be reasonably sure that his sample was representative of the distribution of the whole. But it would still be embarrassing for him to be wrong, so it would be nice to at least have some idea how much of a risk he was taking or maybe if he should taste it again.

That's where the "margin of error" comes in. The error that pollsters give is an indication of how sure they are that the sample they chose is a reasonable reflection of the population at large. For reasons I hope to get into someday (involving The Central Limit Theorem), the pollsters assume that the "true" value of the thing they're estimating follows a bell-shaped curve centered around their estimate. So, if they're trying to figure out how many people in Ohio are going to vote for Obama, they take the results from their poll (49% in the latest Columbus Dispatch poll) and say that the actual percentage of people planning to vote for Obama has a probability distribution forming a bell-curve centered around 49%. That means they can actually quantify the probability that their estimate is off by any given amount. The margin of error is the amount of deviation it takes before the pollsters can say with 95% probability that the true value is within that much of the estimate. They pick 95% mostly out of convention and the fact that it's easy to compute. Here, the key factor is the number of respondents--a rough formula for the margin of error (at the 95% level) for a sample of N people is This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

, which for 1000 people comes out to be about 0.03, or 3%. They might occasionally bump it up to 4% just to be extra sure.

Now, to answer your question, does this mean that McCain could actually be ahead? After all, the 7 point difference is less than twice the margin of error, so if we add that much to McCain and take it away from Obama, it does put McCain on top. It's possible, as I mentioned above, that the pollsters just asked enough of the wrong people to skew the numbers. In fact, if you've been paying close attention to what I said about the true value having a bell-curve distribution, you might have noticed that actually the situation is in some sense worse than just that. That 4% number is just the cutoff for the 95% confidence interval; it could be (with about 5% probability) that the poll is off by even more than 4%. Should we just throw up our hands and quit?

The important thing to remember here is that the margin of error isn't the end of the story. The bell-shaped curve which gave us the error calculation also shows us that it's more likely than not that our estimate is close to the truth. So again, we can quantify the degree of certainty that we have in estimating the difference between Obama's percentage and McCain's. Using a formula (that I admit I had to look up), we can compute that the standard error, a measure of how spread out the distribution is, in that estimation of the 7% Obama-McCain gap in Ohio is about 0.03. That means the bell-curve is pretty narrowly distributed around the guess. According to the bell-curve distribution, this gives a probability of about 99% that Obama is "truly" ahead in Ohio.

So, yes, even if those polling numbers are correct, it is possible that McCain's ahead, but I wouldn't bet on it (unless someone was giving me greater than 100-to-1 odds).

-DrM

Ask Doctor Math

Friday, February 13, 2009

Unexpected Values, part 1

Sunday, October 5, 2008

Poll Position

About Me

Subscribe Now: Feed Icon

Followers

Blog Archive