MathJax

Wednesday, September 17, 2014

Entropy and Bits - Computing for Everyone

Previous: Entropy and Passwords

In information theory, entropy is measured in bits. For each bit of entropy in your set of rules for generating passwords, the number of possible passwords doubles. The famous correct horse battery staple comic does a great job of illustrating these bits as boxes:

Each square represents one bit of information. You know bits already as "binary digits"--something that can be either 1 or 0. This is still the case in information theory, but an important concept when talking about bits and entropy is that there is a 50/50 probability for each bit. The "caps?", "order unknown", and "common substitution" bits are self-explanatory in this context. But why 11 bits for the "uncommon (non-gibberish) base word", 4 bits for "punctuation", or 3 for "numeral"?

To answer this, we model each choice as if it were a random string of 1s and 0s and ask how long that string would have to be to have about as many possibilities as the number of possibilities that have an equal chance of selection (you may remember this from your Algebra 2 class as the base-2 logarithm).

Example: Why is a numeral 3 bits?
Solution: There are 10 possibilities for a base-10 numeral. Three bits yields 23 = 8 possibilities, which is closer to 10 than 24 = 16 possibilities, so we'll round to 3 bits. The exact number of bits would be lg 10 ≈ 3.32 bits (I find it convenient to abbreviate log2n as lg n). I believe it was the correct artistic choice for Mr. Munroe not to depict .32 of a bit.

Using similar reasoning, we see the comic estimates 16-32 punctuation symbols and in the ballpark of 2048 common words.

There's a password generator based on this comic online. Taking a look at its UI and word list, we can calculate the entropy of the default settings for the web site at about 48 bits--4 bits higher than the 44 bits advertised by the comic (3.32 of those come from appending a digit by default, which wasn't in the comic).

Next: Entropy by Another Alphabet

No comments:

Post a Comment