Wednesday, February 11, 2009

"in a hole in the grou31;aadn,m vnatoh424..."

Dear Dr. Math,
Suppose you have a computer randomly generating all the characters making up the text of
The Hobbit. Suppose the computer generates 200 characters per second. Is there a time period after which it becomes probable that the computer has produced the text of The Hobbit? My gut tells me, "No, it would never happen."
Consider this. If a sufficiently powerful being wanted to write a 20,000 page history of tea-drinking, it could either
(a) produce the book
(b) produce all possible 20,000 page long sequences of keyboard characters
TolkienFan

Dear TolkienFan,

Your question is related to the famous Infinite Monkey Theorem, about a room full of monkeys randomly banging on typewriters for all eternity eventually producing the complete works of Shakespeare. (Digression #1: an English major I knew once joked that the monkeys could reproduce the complete works of D.H. Lawrence in a surprisingly short time.) It turns out that the answer is "yes" in some abstract sense, as long as we're careful about what we mean, but the amount of time it would take is far beyond our comprehension--many many orders of magnitude more than the estimated age of the universe. Here's a quick back-of-the-envelope calculation:

1.) Let's only consider the characters on a standard typewriter. Let's say there are 50 possible characters (including numbers and punctuation, but ignoring capitalization, say), although it turns out not to matter very much whether there are 50 keys or 50,000.

2.) Approximate the length of The Hobbit as 360,000 characters--it's a 320 page book, times a standard 250 words per page, times an average 4.5 letters per word in English.

3.) Assume all characters are equally likely to be output by the random generator, be it computer or monkey. (Digression #2: someone actually did this experiment once with real live monkeys and found that the monkeys were extremely overly fond of the letter "s" for some reason. Also, they [the monkeys] really enjoyed urinating on the keyboard.) Assume the characters are probabilistically independent of each other, i.e., knowing what one character is doesn't inform our knowledge of any other one.

4.) Now, break up the output of the generator into chunks of length 360,000. For each chunk, the chance of the first character being equal to the first character of The Hobbit, which is "i", is $\frac{1}{50}$. By the assumption of independence, the chance of the first two characters being correct, "in", is $\frac{1}{50} * \frac{1}{50}$, or $\frac{1}{50^2}$, and so on. So the chance of the whole block of 360,000 characters reproducing the entire book is $\frac{1}{50^{360,000}}$, which is about $10^{-611629}$, an astronomically small number. Let's call that number p. The probability of failing to produce The Hobbit in each chunk is (1-p).

5.) If we imagine doing this process a second time, the probability that we'd fail both times is $(1-p)^2$, because of the independence property again. In general if we repeat it n times, the probability of failing all n times is $\inline (1-p)^n$. Now, and this is the key, (1-p) is extremely close to 1 but isn't actually equal to 1. So if we take n large enough, the probability of failure will eventually converge to 0; meaning that with high probability we will have at some point succeeded in producing The Hobbit. It's not clear what's meant by an event being "probable," but if, say, we wanted there to be a 95% probability of success, meaning a 5% chance of failure, we would need $(1-p)^n=1-.95=.05$. So to solve for n, we can take logarithms of both sides and divide by $log(1-p)$, to get that $n=\frac{log(.05)}{log(1-p)}$, which is on the order of $\frac{1}{p}$, or about $10^{611629}$ blocks of characters.

6.) To give you a sense of how incredibly large this number is, if we generated 200 characters per second, as you say, then 360,000 characters would take 30 minutes, so we could produce about 48 new blocks every day, or 17650 per year, on the order of $10^{4}$. This would mean that our $10^{611629}$ blocks would take about $10^{611625}$ years. For comparison, the universe is estimated to be around 13 billion years old, approximately $10^{10}$, so you would need $10^{611615}$ ages of the universe to have completed the task with 95% probability. Even if you had every atom in the universe working in parallel for the lifetime of the universe, you'd barely make a dent. Surely all our protons would have decayed into nothingness long before you even got to the part in the book where the trolls get turned to stone.

I suppose the moral of all of this, if you believe in that sort of thing, is that sometimes the mathematical consequences of our assumptions can far outstrip our intuition, especially when it comes to events that are exceedingly rare or numbers that are exceedingly large. So in a sense your "gut feeling" is right as far as any practical considerations go. Jorge Luis Borges wrote a story about this called The Library of Babel, which deals exactly with your scenario of a powerful being producing every possible book of a certain length. It's true that among such a vast library, there would be a comprehensive volume of the history of tea-drinking. However, there would also by necessity be an incredible number of false histories, every possible one in fact, as well as fraudulent cures for cancer and incorrect predictions for the next 100 World Series winners, and, importantly, there would be no way to distinguish the fake from the genuine. So it's back to option (a) I'm afraid--just write the book.

-DrM

Anonymous said...

Great explanation.

But I have to wonder, how long would it take them to write "hey hey we're the monkees" ?

Anonymous said...

Well, using the math Dr. Math has provided for us, we can calculate that!

"hey hey we're the monkees" has 25 characters, so using the idea that the probability of the hypothetical 50-key keyboard being right on the first keystroke is 1/50, we get (1/50)^25. Our chance of all 25 characters being typed in the exactly correct order is 3.35x10^-43.

Using the same math to calculate a 95% probability of success, that means we'd have to make 2.69x10^42 attempts ( log(0.05) / log(1 - 3.35x10^-43) ).

At a rate of 200 characters per second, our monkeys can produce 8 blocks of 25 characters a second, which means 691,200 blocks per day, and 252,288,000 per year.

2.69x10^42 attempts, with 252 million attempts per year, works out to about 10^33 years, significantly less than what it would take to write The Hobbit, but still 10^22 ages of the universe.

I guess it's a good thing we have 1960s television producers to do these things for us!

captnkurt said...

But if there were an infinite number of monkeys, all typing at the same rate of 200 characters/second, (man, those guys are fast!) then wouldn't it be correct to say that you would definitely have The Hobbit in 30 minutes?

Either way, that's a lot of monkey pee to mop up.

drmath said...

Good calculations, Anonymous! Exactly what I was hoping for--now that you understand the idea, you can apply it yourself.

Captnkurt, the problem here is what you mean by "an infinite number of monkeys." Maybe the most logical way to phrase it would be to say, "If you had n monkeys working simultaneously, the probability of producing the Hobbit in 30 minutes (that is, nailing on the first try) would converge to 0 as n converged to infinity." However, it's statistically the same as having 1 monkey repeat the process n times, so you can already see that the number of monkeys would have to be huge, more than the number of atoms in the universe by many orders of magnitude.

Of course, they would pee everywhere, shortly before they all collapsed into a massive black hole and destroyed the universe.

Anonymous said...

Unless I misunderstand (and that's quite possible), I think you've introduced a major flaw here...

The "second chunk" begins at character number 2, not character number 360,001. There is no reason why these should be considered discrete chunks and so just because the first character isn't "I" doesn't not affect the fact that the second and subsequent characters may spell out the work.

Thusly, your monkeys are producing over 17 million "blocks" a day, not just 48...

Obviously, the numbers are still so large that this makes very little difference to the outcome.

drmath said...

Excellent question! I'll have to devote a few paragraphs in a future post to answering it, though. Stay tuned for the Infinite Monkey Theorem, redux.