Ask Doctor Math: infinite monkey theorem

I've gotten some very interesting responses to my post about the Infinite Monkey Theorem, concerning the likelihood of a monkey accidentally reproducing The Hobbit by randomly generating letters. So I thought I'd write a follow-up to address some of them; also it gives me another chance to imagine a monkey typing on a typewriter. (I tried letting a monkey type this up for me, but he wrote something much more interesting than I had planned.)

Dear Dr. Math,
I believe the monkey problem makes the simplifying assumption that all the letters in the text are independent, which they're not in real English (or any other language). How does this affect the results? Real texts are sampled very narrowly from the space of possible letter sequences.
CN

Excellent question, CN; I'm glad you brought it up. In fact, the distribution of the letters in the text is irrelevant to the problem. The important assumption is that the letters being output by the monkey are equally likely and probabilistically independent of each other. Under this assumption, it doesn't matter if the text the monkey's trying to match is The Hobbit or the phone book or a sequence of all "7"s--the probability of matching any sequence of 360,000 characters will be $\frac{1}{50^{360,000}}$ . If you need convincing, consider the simpler example of flipping a fair coin 5 times. Any particular sequence of 5 flips, for example HHTHT, has the same probability, $\frac{1}{2^5}$ , of coming up. So, if we were trying to match any flip sequence, we'd have the same chance. The same is true here, just on a much larger scale (and with monkeys).

Now, many people object to this idea, because they think that a letter sequence like "a;4atg 9hviidp" is somehow more "random" than a sequence like "i heart hanson". Therefore, they reason, the first sequence would be more likely to occur by chance. But actually the two sequences have exactly the same probability of occurrence, under our assumptions. Really, the only difference between the two is what we could infer about the message source (beyond musical tastes) based on receiving such an output. I hope to discuss this in detail someday in the context of elementary hypothesis testing, but if, say, there were some doubt in our minds as to whether the source of these characters was, in fact, uniformly random, the latter message would give us considerable evidence to help support that doubt. The reason is that we could provide an alternative hypothesis that would make the observed data much more likely. For the monkey problem, however, we were assuming that we knew how the characters were being generated, so there's no doubt.

Dear Dr. Math,
Along the lines of your previous questions on large numbers and randomly generating the book The Hobbit, I'd like to ask about randomly generating images. A low res PC display is normally 640 x 480 pixels. If you randomly generated every combination of color pixels, wouldn't you have created every image imaginable at that resolution? That is, one of the screens would be the Mona Lisa, one would be your Ask Doctor Math page, one would be a picture of the Andromeda galaxy from close up, one would be a picture of you!, etc. If you only wanted to look at black & white images, you'd have a much smaller collection, but once again wouldn't you generate every B&W screen image possible?

With feature recognition software getting better all the time, one could "mine" these images for recognizable features. Similar to the way the pharmaceutical companies sequence through millions of plants to find new substances, one could sequence through these images to extract unknown info.
Mike

Dear Mike,

Absolutely, we could apply the same techniques to any form of information that can be reduced to a sequence of numbers or letters, like images, CDs, chess games, DNA sequences, etc. In fact, we needn't generate them randomly, either. As in The Library of Babel example, one could imagine a vast collection of all possible sequences, generated systematically one at a time with no repeats. Unfortunately, for any interesting form of information, the number of possibilities is simply too great to make it practical.

In your example of 640x480 pixel images, even assuming the images were 1-bit monochrome, there would still be 2 possibilities ("on" or "off") for each of the 640*480 = 307,200 pixels. Therefore, the number of possible images would be $2^{307,200}$ , which is about $10^{92,476}$ . Remember how big a googol is? Well, this number is about $(1 \text{ googol})^{1000}$ . So, not even all the crappy low-res monitors in the universe could possibly display them, even at their lousy maximum refresh rate of 60 images/second. And even worse, we'd have no reason to believe any of the images we did see, because they'd be indistinguishable from all the many other conflicting images.

Your comparison to pharmaceutical companies is interesting, but remember those companies are starting with a large (but manageable) collection of plants that actually exist, not searching through the space of all possible arrangements of plant cells or something. It's OK to search for a needle in a haystack sometimes, but not when the haystack is larger than the known universe.

Dear Dr. Math,
Unless I misunderstand (and that's quite possible), I think you've introduced a major flaw here... The "second chunk" begins at character number 2, not character number 360,001. There is no reason why these should be considered discrete chunks and so just because the first character isn't "I" doesn't affect the fact that the second and subsequent characters may spell out the work. Thusly, your monkeys are producing over 17 million "blocks" a day, not just 48...
A. Nonymous

Well, A, that all depends on how we set up our assumptions. The way I had pictured things, the monkey was typing out a whole manuscript of 360,000 characters at a time and then having someone (perhaps J.R.R. Tolkien himself!) check it over and see if it was exactly the same as The Hobbit. If not, the monkey tries again and, with very high probability, fails.

However, your idea is more interesting and perhaps more "realistic". That is, we could have Prof. Tolkien just watch over the monkey's shoulder as it typed and see if any string of 360,000 consecutive characters were The Hobbit. So, if the monkey started by typing a preamble of gibberish and then typed the correct text in its entirety, we'd still count it as correct. As you say, this means that the possible "chunks" we'd need to consider have a lot of overlap to them--we might find the text in characters 1 through 360,000 or 2 through 360,001, etc. But unfortunately, it's not just the number of chunks being produced we need to reconsider; because of the way they overlap, we've now introduced relationships between the chunks that mean our assumption of independence no longer holds. For example, if we knew the first block of characters was incorrect, we could determine whether it was even possible for the second block to be correct based on the particular way the first block was wrong. In fact, we'd know it was impossible unless the first block was something like "xin a hole in the ground there lived a hobbit...".

Actually, if we thought about things in this way, then CN's question above would be relevant, because the codependency of the overlapping chunks would depend heavily on the particular text we were trying to match. Consider the example of coin-flipping again: assume we were flipping the coin until we got the string TH. There are 3 possible ways we could fail on the first pair of flips, all equally likely: TT, HT, and HH. If we got TT or HT, then we could succeed on the third try by flipping an H. If we started with HH, there's no way we could get TH on the third flip. The number of ways of succeeding would be 2, out of a possible 6. So the probability of succeeding in the second block given that we failed in the first would be $\frac{2}{6}=\frac{1}{3}$ .

Now, if we were trying to match HH and we knew we failed on the first 2 flips, there would still be 3 equally likely possibilities. Either we flipped TT, TH, or HT. If we started off with TT or HT, we can't possibly win on the third flip. But if we got TH first, we'd have a $\frac{1}{2}$ chance of flipping H on the third flip and matching. Thus, our probability of matching in the second block given that we failed in the first would only be $\frac{1}{6}$ . Here's a chart showing all of the possibilities:

The two probabilities are different because TH can overlap in more ways with the wrong texts, whereas HH can only overlap with TH.

Therefore, our previous strategy of multiplying probabilities, which rested on the assumption of independence, won't work here. In order to explain how long it would take the monkey to produce The Hobbit with high probability under your scheme, I'd have to go into some fairly heavy-duty math involving Markov chains and their transition probabilities. The relevant probabilities can be found by raising a 360,000 x 360,000 matrix to the nth power--not generally an easy thing to do. But it turns out that the expected (i.e., average) number of characters the monkey would have to type before finishing would still be on the order of $50^{360,000}$ , similar to the the previous setup.

Either way, you and J.R.R. would have probably given up by that point.

-DrM

Dear Dr. Math,
Suppose you have a computer randomly generating all the characters making up the text of The Hobbit. Suppose the computer generates 200 characters per second. Is there a time period after which it becomes probable that the computer has produced the text of The Hobbit? My gut tells me, "No, it would never happen."
Consider this. If a sufficiently powerful being wanted to write a 20,000 page history of tea-drinking, it could either
(a) produce the book
(b) produce all possible 20,000 page long sequences of keyboard characters
TolkienFan

Dear TolkienFan,

Your question is related to the famous Infinite Monkey Theorem, about a room full of monkeys randomly banging on typewriters for all eternity eventually producing the complete works of Shakespeare. (Digression #1: an English major I knew once joked that the monkeys could reproduce the complete works of D.H. Lawrence in a surprisingly short time.) It turns out that the answer is "yes" in some abstract sense, as long as we're careful about what we mean, but the amount of time it would take is far beyond our comprehension--many many orders of magnitude more than the estimated age of the universe. Here's a quick back-of-the-envelope calculation:

1.) Let's only consider the characters on a standard typewriter. Let's say there are 50 possible characters (including numbers and punctuation, but ignoring capitalization, say), although it turns out not to matter very much whether there are 50 keys or 50,000.

2.) Approximate the length of The Hobbit as 360,000 characters--it's a 320 page book, times a standard 250 words per page, times an average 4.5 letters per word in English.

3.) Assume all characters are equally likely to be output by the random generator, be it computer or monkey. (Digression #2: someone actually did this experiment once with real live monkeys and found that the monkeys were extremely overly fond of the letter "s" for some reason. Also, they [the monkeys] really enjoyed urinating on the keyboard.) Assume the characters are probabilistically independent of each other, i.e., knowing what one character is doesn't inform our knowledge of any other one.

4.) Now, break up the output of the generator into chunks of length 360,000. For each chunk, the chance of the first character being equal to the first character of The Hobbit, which is "i", is $\frac{1}{50}$ . By the assumption of independence, the chance of the first two characters being correct, "in", is $\frac{1}{50} * \frac{1}{50}$ , or $\frac{1}{50^2}$ , and so on. So the chance of the whole block of 360,000 characters reproducing the entire book is $\frac{1}{50^{360,000}}$ , which is about $10^{-611629}$ , an astronomically small number. Let's call that number p. The probability of failing to produce The Hobbit in each chunk is (1-p).

5.) If we imagine doing this process a second time, the probability that we'd fail both times is $(1-p)^2$ , because of the independence property again. In general if we repeat it n times, the probability of failing all n times is $(1-p)^n$ . Now, and this is the key, (1-p) is extremely close to 1 but isn't actually equal to 1. So if we take n large enough, the probability of failure will eventually converge to 0; meaning that with high probability we will have at some point succeeded in producing The Hobbit. It's not clear what's meant by an event being "probable," but if, say, we wanted there to be a 95% probability of success, meaning a 5% chance of failure, we would need $(1-p)^n=1-.95=.05$ . So to solve for n, we can take logarithms of both sides and divide by $log(1-p)$ , to get that $n=\frac{log(.05)}{log(1-p)}$ , which is on the order of $\frac{1}{p}$ , or about $10^{611629}$ blocks of characters.

6.) To give you a sense of how incredibly large this number is, if we generated 200 characters per second, as you say, then 360,000 characters would take 30 minutes, so we could produce about 48 new blocks every day, or 17650 per year, on the order of $10^{4}$ . This would mean that our $10^{611629}$ blocks would take about $10^{611625}$ years. For comparison, the universe is estimated to be around 13 billion years old, approximately $10^{10}$ , so you would need $10^{611615}$ ages of the universe to have completed the task with 95% probability. Even if you had every atom in the universe working in parallel for the lifetime of the universe, you'd barely make a dent. Surely all our protons would have decayed into nothingness long before you even got to the part in the book where the trolls get turned to stone.

I suppose the moral of all of this, if you believe in that sort of thing, is that sometimes the mathematical consequences of our assumptions can far outstrip our intuition, especially when it comes to events that are exceedingly rare or numbers that are exceedingly large. So in a sense your "gut feeling" is right as far as any practical considerations go. Jorge Luis Borges wrote a story about this called The Library of Babel, which deals exactly with your scenario of a powerful being producing every possible book of a certain length. It's true that among such a vast library, there would be a comprehensive volume of the history of tea-drinking. However, there would also by necessity be an incredible number of false histories, every possible one in fact, as well as fraudulent cures for cancer and incorrect predictions for the next 100 World Series winners, and, importantly, there would be no way to distinguish the fake from the genuine. So it's back to option (a) I'm afraid--just write the book.

-DrM

Ask Doctor Math

Monday, February 23, 2009

The Infinite Monkey Strikes Back

Wednesday, February 11, 2009

"in a hole in the grou31;aadn,m vnatoh424..."

About Me

Subscribe Now: Feed Icon

Followers

Blog Archive