Showing posts with label polling. Show all posts
Showing posts with label polling. Show all posts

Tuesday, February 24, 2009

80085

With Valentine's Day just passed and Ash Wednesday lurking around the corner, I know the topics of sex and pregnancy are on a lot of people's filthy guilt-ridden minds these days. To help people understand their risks, and to show I'm not a prude, I'm hosting a little get-together (an orgy, if you will) of questions all about sex. So turn the lights down low, put on some soft music, and enjoy this special "adults only" post about what we in the math business call "multiplication."*


Dear Dr. Math,
I read in an article that "Normally fertile couples have a 25 percent chance of getting pregnant each cycle, and a cumulative pregnancy rate of 75 to 85 percent over the course of one year." How do you go from 25% to 85? I don't see the connection between those two numbers.
Name Withheld


As is often the case, Name, the way to understand the probability of getting pregnant over some number of time intervals (I almost wrote "periods" there but then reconsidered) is instead to think about the probability of not getting pregnant during any of those intervals. We can use the fact that the chance of something happening is always 1 minus the chance of it not happening. This turns out to be a generally useful technique whenever you're interested in the occurrence of an event over multiple trials. To take my favorite over-simplified example of flipping a coin, if we wanted to find the chance of flipping an H (almost wrote "getting heads"--geez, this is har.., er, difficult) in the first 3 flips, we could go through all of the possible 3-flip sequences and count how many of them had at least one H, or we could just observe that only one sequence doesn't contain an H (namely, TTT). Since the probability of flipping T ("getting tails") is on each flip, the chance of "doing it three times" is . Thus, the probability of at least one H is . Phew.

Similarly here, there are lots of different ways to get pregnant over the course of a year (believe me), but only one way to not get pregnant. If we take the first statistic as correct, that the chance of a normally fertile couple getting pregnant in each cycle is 25%, then we could assume that the chance of not getting pregnant in each cycle was 75%, or 0.75. Assuming a "cycle" is 28 days long, there would be 13 cycles per year, so by the same reasoning as above, we could say that the chance of not getting pregnant in a year is , about 2.4%. So, the chance of "being in the family way" at some point during the year would be , or 97.6%.

Now, that doesn't match up with the observed number you quoted, 85%. In the study, of course, all they do is assemble some group of "normally fertile" couples and count the number of times they get pregnant in a year. We were trying to solve the problem "top down" whereas the data is observed from the "bottom up." What's going on? Well, the problem was our assumption that the different cycles were independent from each other, in the sense that knowing what happened in one cycle doesn't affect our estimation of what will happen in the next. For coin-flipping, this is a reasonable assumption, but for copulation, not so much. It makes sense that there should be some correlation between the different cycles, because the possible causes for infertility one month might continue to be true the next. For example, it could be that either or both partners have some kind of medical condition that makes conception less likely. Or maybe the guy's underwear is too tight, I don't know. But it seems that the assumption of independence probably doesn't hold. Also, it's not entirely clear what's meant by "normally fertile" here, since (as far as I know) it's only really possible to know if a couple is "fertile" if they've succeeded in having a baby. So, it's possible that the data includes some number of couples who were just less fertile and perhaps didn't know it.

The correct way to understand these compound probabilities is to consider the probability of not conceiving in one cycle conditional on the event that you had not conceived the cycles previously. Unfortunately, I don't have access to that information from personal experience, nor a good mental model for what numbers would be reasonable. However, it seems like the probability of not conceiving should be higher than ordinary if you know already that you've gone some number of months without conceiving. As a result, the odds of getting pregnant in a year should be lower than our estimate assuming independence, which does in fact agree with the data.


Dear Dr. Math,
Planned Parenthood's web site says, "Each year, 2 out of 100 women whose partners use condoms will become pregnant if they always use condoms correctly." Is that the same as saying that condoms are 98% effective? If so, does that mean that if you have sex 100 times, you'll likely get somebody pregnant twice? (I mean, if you're a man. If you're a woman I imagine the rate of impregnating your partner will probably slip in the direction of zero.) Yours always,
Name Withheld


Oh, you freaky Name Withheld, you've asked the question backwards! In fact, the statistic you give of 2 women out of 100 becoming pregnant in a year is how the effectiveness of condoms is defined. That is, in the birth control industry, specifically, when someone claims that a particular method is "x% effective," it means that if a group of women use that method, over the course of the year about (100-x)% of them will get pregnant. Now, there are a number of assumptions being made here, not the least of which is that those women (and their partners) used the method correctly. Without actually going into people's bedrooms (or living rooms, or kitchens?) and tallying up on a clipboard whether their condom use was "incorrect", it's impossible to know for sure. Instead, people who do surveys of this kind have to rely almost exclusively on what people say they did. And let me ask you something: If you accidentally impregnated someone/got impregnated by someone while nominally using some birth control method, would you say, when asked, that you had been using it "incorrectly"? Or would you, as all good carpenters do, blame your tools?

Another implicit assumption is that the respondents reflect a typical number of sexual encounters in a year. Again, I don't know how they decide what participants to include in this kind of study or how they verify the claims they get, but according to some studies I was able to find, the average "coital frequency", as it's romantically known, for both married and single people in the U.S. is somewhere around 7 encounters per month. Therefore, if we treated the experiences as being independent (with the same caveat as in the previous question), we could estimate the probability of unintended pregnancy in a single sexual encounter:

Let's call the probability p. So the chance of not getting pregnant during a given sex act is (1-p). We'll accept the 7 times/month figure and assume a total of sexual encounters per year, all including correct condom usage. As in the coin example, we've assumed independence, so the probability of not getting pregnant over the course of 84 trials is , which we're assuming is equal to the stated number of 98%. Therefore, we have:

And so , meaning that p is very small, about 0.02%. Therefore, if you had sex 100 times, as you say (and congrats, btw), you could expect to make an average of 0.02 babies.

Some important notes:
1) Our assumption of independence here may be more reasonable than in the previous example, because it's possible that whatever factors contribute to a birth control method failing despite proper use may be due more to chance than any kind of recurring trends.
2) Also, these numbers don't account for the fact that (as we saw above) the chance of getting pregnant in a year even without any protection is something like 85%. So, in a sense, condoms "only" reduce the risk of pregnancy from 85% to 2%.
3) We've only been talking about pregnancy here, not the risks of other things like STDs or panic attacks.
4) Wear a condom, people!


Dear Dr. Math,
Mathematically speaking, what number makes for the best sexual position?
Name Withheld

You seem to be asking a lot of questions, NW.

Personally, I've always enjoyed the ln(2π).

-DrM

*Also acceptable: or "integration by parts".

Friday, February 13, 2009

Unexpected Values, part 3

And now, the exciting conclusion!


Dear Dr. Math,
My college administration says that the student-teacher ratio is 5:1. But all my classes have like 30 people in them. What gives? Are they just lying?
Sincerely,
A Student at a Local College

Well, ASAALC, that depends on what you mean by "lying." If you mean, Are they miscalculating their student-faculty ratio?, then the answer's probably "no." These things are public information, and anyone with a calculator can divide the number of students by the number of teachers. (Whether they include non-teaching faculty, like that weird guy who lives in the basement and hasn't taught a class since the 60s, is kind of an ethical gray-area.) However, if the question is, Are they misleading people by reporting a somewhat meaningless statistic?, then "yes." Here's how the magic trick works:

It all comes back to the idea of average, and the different things the word "average" means to people. Many people think of "average" as meaning "typical" or "to be expected," a misperception that isn't helped any by the synonym "expected value." So, if a college reports that it has a student-teacher ratio of 5:1, or equivalently, that it has an average class size of 5, people leap to the conclusion that the typical class they'll encounter at the college will have 5 people in it (and how awesome is that?!). However, that may not be the typical experience, and it may even be impossible. Let's take a look at a simple example:

If I told you that my girlfriend and I were going to flip a coin and decide whether to have a baby based on the result,* the average number of babies we would have, i.e., our expected number of babies, would be 1/2. But of course, we would be surprised, even shocked, if we actually had half a baby. In this case, the "expected" value is anything but. The value 1/2 just represents the plausibility we associate to the chance of having one baby. If we repeated the experiment many times, the ratio of babies to attempts would converge to 1/2.

Similarly, suppose that a tiny school had 10 total students and 2 teachers. So, the student-teacher ratio is 5:1. Now, let's say one of the teachers is that awesome guy who lets you call him by his first name, and the other has really bad dandruff or something, so 9 people register for Class #1 and only 1 person registers for Class #2. The average class size is the average of 9 and 1, which is 5--equal to the ratio of students to teachers, as it should be. However, if I picked a random student and asked him (it's an all boys' school) how many people were in his class, 9 times out of 10 he would say 9, and 1 time out of 10 he would say 1. So the average response I would get would be , for a difference of 3.2! In fact, the only way the two numbers can actually be the same is if all the classes are exactly the same size.

If you think about it, it makes perfect sense that I'd get someone in a larger class more often in a random poll, since the larger classes have more people in them to get polled. The question, then, is which of these numbers actually represents the "typical" student experience. Since the student-teacher ratio is generally smaller, the schools are happy to just report that and hope you don't notice the difference. What they really should be reporting is something more like the distribution of class sizes. As it is now, if you want to know what the classes are like, you've just got to go see for yourself.

Also, what's the deal with the vending machine only having Pepsi?

-DrM

*For the record, that's not how we do it. We roll a 20-sided die.

Sunday, October 5, 2008

Poll Position

Dear Doctor Math,
I saw a poll today that said Obama is up 7 percentage points in Ohio with a margin of error of 4%. So does that mean he could actually be losing there? Also, how can they come up with these numbers just by asking a few hundred people?
A Concerned Citizen

Those are good questions, ACC, and they're related. To answer them, I should talk a little about how polling works and what the various numbers mean. First off, polling is an imperfect attempt at predicting the future. No one knows for sure what's going to happen on election day, and sometimes (see Florida in 2000) it's hard to figure out what did happen even after the election. But polls are our best guess, and usually they do a pretty good job.

To conduct a poll, a news agency like CBS or Reuters or a public opinion firm like Rasmussen gets a staff of questioners to each call a handful of people and ask them their opinion on things, like how they're planning to vote. Since it takes time and money to make the calls, the pollsters typically limit themselves to something like a few thousand people. Of course, a lot of people (like young people who don't have landlines and probably vote Democratic, but never mind...) don't answer, so by the time the pollsters compile all their data together, they've got maybe 1000 quality responses to go on. From here they try to figure out what the remaining millions of people in the state or country are thinking, and then they report that information, thereby influencing the way people think, but that's another story.

So, the first question is, how do they know they didn't just ask all the wrong people? And the answer is they don't know for sure, of course, but if their methods are sound they can say with a reasonable degree of certainty that their polling numbers reflect the larger population. Think of Mario Batali tasting a single spoonful out of a pot of marinara sauce to see if it needs more oregano. Of course, it's possible he just got the most oreganoed spoonful in the whole pot, but if he's done a good job of stirring it up beforehand, he can be reasonably sure that his sample was representative of the distribution of the whole. But it would still be embarrassing for him to be wrong, so it would be nice to at least have some idea how much of a risk he was taking or maybe if he should taste it again.

That's where the "margin of error" comes in. The error that pollsters give is an indication of how sure they are that the sample they chose is a reasonable reflection of the population at large. For reasons I hope to get into someday (involving The Central Limit Theorem), the pollsters assume that the "true" value of the thing they're estimating follows a bell-shaped curve centered around their estimate. So, if they're trying to figure out how many people in Ohio are going to vote for Obama, they take the results from their poll (49% in the latest Columbus Dispatch poll) and say that the actual percentage of people planning to vote for Obama has a probability distribution forming a bell-curve centered around 49%. That means they can actually quantify the probability that their estimate is off by any given amount. The margin of error is the amount of deviation it takes before the pollsters can say with 95% probability that the true value is within that much of the estimate. They pick 95% mostly out of convention and the fact that it's easy to compute. Here, the key factor is the number of respondents--a rough formula for the margin of error (at the 95% level) for a sample of N people is , which for 1000 people comes out to be about 0.03, or 3%. They might occasionally bump it up to 4% just to be extra sure.

Now, to answer your question, does this mean that McCain could actually be ahead? After all, the 7 point difference is less than twice the margin of error, so if we add that much to McCain and take it away from Obama, it does put McCain on top. It's possible, as I mentioned above, that the pollsters just asked enough of the wrong people to skew the numbers. In fact, if you've been paying close attention to what I said about the true value having a bell-curve distribution, you might have noticed that actually the situation is in some sense worse than just that. That 4% number is just the cutoff for the 95% confidence interval; it could be (with about 5% probability) that the poll is off by even more than 4%. Should we just throw up our hands and quit?

The important thing to remember here is that the margin of error isn't the end of the story. The bell-shaped curve which gave us the error calculation also shows us that it's more likely than not that our estimate is close to the truth. So again, we can quantify the degree of certainty that we have in estimating the difference between Obama's percentage and McCain's. Using a formula (that I admit I had to look up), we can compute that the standard error, a measure of how spread out the distribution is, in that estimation of the 7% Obama-McCain gap in Ohio is about 0.03. That means the bell-curve is pretty narrowly distributed around the guess. According to the bell-curve distribution, this gives a probability of about 99% that Obama is "truly" ahead in Ohio.

So, yes, even if those polling numbers are correct, it is possible that McCain's ahead, but I wouldn't bet on it (unless someone was giving me greater than 100-to-1 odds).

-DrM