Sunday, March 8, 2009

Prob(Rain and not(Pour))=0

Dear Doctor Math,
What does 40% chance of precipitation mean?
Dr. Anonymous, Ph.D., J.D.

Dear Dr. Anonymous,

This is a tricky one, and it confuses a lot of people. First, some wrong answers, courtesy of the Internet:

1) "It will be expected to rain, but the significant chance of rain occurs in only 40 percent of the studied geographical area."

2) "It will definitely rain, but only for 40% of the day."

3) "4 out of 10 meteorologists think it will rain."

4) "40% or below means it won't rain; 70% and above means it will."

5) "There is a 40% chance of precipitation somewhere in the forecast area at some point during the day."

Nope; nope; nope; definitely not; almost, but no. Who would have thought there could be so many different incorrect ways to interpret such a simple statement? We can dismiss numbers 3 and 4 right away--meteorologists aren't generally in the business of taking surveys of each other, and they wouldn't bother with the whole percentage thing if they had a definite opinion either way. The others are tempting, but none get it exactly right. Don't fret if you found yourself agreeing with one of them, though; according to this study in the journal Risk Analysis, respondents from five major cities all over the world overwhelmingly preferred numbers 1, 2 and 5 to the correct answer, which, according to The National Weather Service and the Canadian Weather Office, is this:

"The probability of precipitation (POP) is the chance that measurable precipitation (0.2 mm of rain or 0.2 cm of snow) will fall on any point of the forecast region during the forecast period."

In your example, a 40% chance of rain means the forecaster has determined that the probability is 40% that you will get rained on at some point during the forecast period, usually a day in length (some forecasts include hour-to-hour predictions, but the same idea applies--either way, interpretation 2 is out). But even that's kind of a circular definition--a 40% chance means the chance is 40%--and it doesn't explain where the numbers come from in the first place. To unpack things a little more, we should talk about how weather prediction works and why uncertainty is necessarily involved.

To begin with, meteorologists are always collecting data, tons and tons of it, using extremely sensitive instruments both on the ground and in the upper atmosphere, via weather balloons. These instruments measure a whole array of weather conditions including temperature, humidity, wind speed and direction, pressure, dew point (whatever that is), and others. Then, the meteorologists feed all of that information into huge computer simulations, which use a combination of historical data and physical models (for example, equations of fluid dynamics) to predict the course of events over the next few days. Presumably, the models include the movements of butterflies, because, as I understand it, they are the source of basically all weather phenomena. Why the uncertainty, then? Will all that super technology, why can't they just predict the future?

Well, as I discussed in my previous post about the Monty Hall problem, there are many complex systems in the world which perhaps could be described exactly by massive systems of equations but which we as humans lack the computational power to fully comprehend. And weather is right up there with the most complex systems around. Part of what makes it so difficult to predict is its susceptibility to non-linear feedback, meaning the evolution of the system depends very sensitively on its current state. As a result, even minuscule differences in initial conditions will, with enough time, accumulate into larger differences and eventually produce radically big differences in the outcome. To give you some perspective, scientists have known since the 1960s that even a difference of 0.000001 in initial measurements could lead to vastly different weather conditions after a period of only a week. After a few weeks, you can basically forget about it--in order to predict the weather in any meaningful way, you'd need a complete understanding of the position of every atom in the universe. In the lingo of the 1990s, weather systems are a classic example of chaos, due to their sensitive dependence on initial conditions. Jeff Goldblum explained it all before almost being eaten by a T-Rex.

What makes this sensitive-dependence business such a bummer is that even the most expensive weather-measurement instruments necessarily have some amount of error. For example, a device that measures humidity might detect the presence of water, maybe even down to the last molecule, but that doesn't mean that other patches of nearby air have that same water content. The instrument is kind of like a pollster, asking the air how it's planning to "vote" in the upcoming "election," but it can't "poll" everyone, because there are something like 10^24 "registered voters" in every cubic foot of air. And let's face it, some of them are just "crazy." Failing to account for even a small number of these leads to enough initial uncertainty that prediction becomes more or less impossible. One of the more disturbing things I came across in the course of researching this post was the advice to meteorologists that they should speak in certainties, because "People need to plan for what they need to do. People do not like to be unsure of what will happen." Well, I'm sorry, people, but that's just life.

The output of all these models, then, is just an estimate of the plausibility of the occurrence of rain, just as the statement "the chance of rolling a fair die and getting a 3 is 1/6" is an estimate of that event's plausibility. With enough identical trials, one could perhaps judge whether the estimate had been correct, since the frequency of occurrence should converge to its probability (according to the Law of Large Numbers). However, that whole idea is kind of inapplicable here, because it's impossible to observe multiple independent instances of the same exact weather conditions for the same place at the same time. What would it even mean to say, "Out of 100 instances when conditions were exactly like this in New York on March 8, 2009, it rained in about 40"?* One of the persistent fallacies regarding weather prediction is that it is frequently "wrong." But how can an estimate of uncertainty be wrong? Even unlikely events do occasionally occur--consider 10 flips of a coin; any 10-flip sequence has probability 1/1024 of occurrence, but one of them has to happen. It doesn't mean the probability estimate was wrong. Bottom line, I'll take the National Weather Service over my uncle's "trick knee" any day.

So, the complexity of weather and the imprecision of measurement is one source of uncertainty, but there's actually another one: the weatherman/weatherwoman doesn't know where you are in his/her forecast area. See, the weather forecast covers a fairly large area (like a whole zipcode), and inside that area there are lots of measurement stations, each of which could detect precipitation during the day. Even if a meteorologist knew for a fact that it was going to rain in exactly 40% of the area (and even knew where, too), he/she still would tell you the chance of it raining on you was 40%, since as far as he/she knows, you could be anywhere in town. In a way, interpretation 1 above is a possibility, although it's not by any means the only one. For example, on the other extreme, it could be that the weatherperson thinks that there's a 40% chance it's going to rain everywhere and a 60% chance it's not going to rain anywhere, in which case it doesn't matter at all where you are--your chance of getting rained on is 40%. I can't climb inside Al Roker's head (unfortunately) nor do I have access to his computer models, but most likely, I think the truth is some mixture of these ideas--the 40% figure accounts for both the probability of rain at all and the distribution of rain if it occurs.

This, by the way, is what's wrong with interpretation 5 above, in case you were wondering; there's a subtle difference between saying "the probability of rain at any given point" versus "the probability of rain at some point," assuming that it's possible that it could rain in some places and not rain in others. Think of it this way: if I randomly chose to send an Applebee's gift card to one of my parents for Purim, the probability of any given parent (either Mom Math or Dad Math) getting the card would be 50%, but the probability of some parent getting the card would be 100% (I'm definitely sending it to someone). Another way to phrase it would be to say that if I picked a parent at random after giving out the card, the chance that I would pick the person with the card would be 50%.

In the end then, a more precise definition of what it means to say, "There's a 40% chance of rain" would be something like:

"Given all available meteorological/lepidopteric information, subject to measurement error and uncertainty, I estimate the chance of a location selected uniformly at random from the forecast area receiving any measurable precipitation during the day at 40%."

Maybe a little too wordy for morning drive-time, though.


*Side note: it did actually rain today, and Dr. Math chose not to bring an umbrella despite a forecast of "80% chance of rain." Sometime you have to roll the hard 6.

No comments: