MathJax

Monday, September 29, 2014

Your MOM's a determinant

To my sister and her classmates, so that their math homework can be just a little less depressing.

Here are a few perfectly reasonable questions from high school sophomores about finding the determinant of a matrix:

  • What are they good for?
  • Who cares?
  • This is stupid and I hate you.
Let's address these questions. The first stop in the 21st century is, of course, Wikipedia:
The determinant provides important information about a matrix of coefficients of a system of linear equations, or about a matrix that corresponds to a linear transformation of a vector space. In the first case the system has a unique solution exactly when the determinant is nonzero; when the determinant is zero there are either no solutions or many solutions. In the second case the transformation has an inverse operation exactly when the determinant is nonzero. A geometric interpretation can be given to the value of the determinant of a square matrix with real entries: the absolute value of the determinant gives the scale factor by which area or volume (or a higher-dimensional analogue) is multiplied under the associated linear transformation, while its sign indicates whether the transformation preserves orientation. Thus a 2 × 2 matrix with determinant −2, when applied to a region of the plane with finite area, will transform that region into one with twice the area, while reversing its orientation.
Determinants occur throughout mathe
matics. The use of determinants in calculus includes the Jacobian determinant in the substitution rule for integrals of functions of several variables. They are used to define the characteristic polynomial of a matrix that is an essential tool in eigenvalue problems in linear algebra. In some cases they are used just as a compact notation for expressions that would otherwise be unwieldy to write down.
Ugh, I think I see the problem. These are, actually, really good reasons to be familiar with determinants, but most of these concepts are held back until a college-level Linear Algebra course. On that day when these sophomores take Linear Algebra, they will realize the brilliance of determinants--but they're depressed now. Let's try to do better than that.

A matrix's determinant tells you
  • whether a matrix is invertible, which tells you
    • whether you can undo a matrix multiplication operation
    • whether you can solve a system of equations based on that matrix
  • what happens to a vector when it's multiplied by the matrix:
    • How much does it stretch?
    • Does it flip inside out?
    • There could also be some rotation of the vector, but if we only care about its size and inside-outness, we can save the trouble of matrix multiplication if we know the determinant.
This, of course, presupposes that you care about matrix multiplication, which is itself a worthy topic for a similar post.

In addition to what a determinant tells you about a matrix, there are a few mathematical formulas that can be succinctly represented as computing the determinant of a matrix. This could save space on a cheat sheet for a future physics test. This is a nice collection of determinants doing cool things.

Look, this determinant is doing volume!

The procedure for computing the determinant of a 3x3 or larger matrix illustrates a key idea in mathematics: recursion. You find the determinant of a large matrix by finding the determinants of smaller submatrices and combining the results. This pattern of a procedure for solving a problem including running the procedure against a smaller version of the current problem is called recursion and pops up everywhere once you start looking for it.

The most important thing you get out of studying determinants in high school...


Drumroll, please.

It's tedious and awful.

You're insane.
Let me explain. Why is it important for high school students to be tortured with tedious, awful plug-n-chug work? Because it's a strong motivator for the self-study of computer programming.
That came out of nowhere.
Let me explain.

If you're in high school today, you've never known a planet without personal computers. You probably spend more time on tablets than you do watching TV. More and more data about what you do every day are winding up in computer systems. More and more jobs require the ability to analyze and manipulate data. Literacy was to the 20th century as digital literacy is to the 21st. Until computer programming is part of a normal curriculum, you're on your own to learn this stuff.

Computing the determinant of a large matrix is the sort of "hard" that mankind invented computers to solve: lots of simple steps strung together. You won't be hired to find the determinant of a matrix because we have software that does that. There are lots of other computational activities taken over by computers as well--pretty much any well-defined useful task that can be broken down into simple steps. This is a good thing because it lets people focus their time and energy on the next problem (there's always a next problem). The lessons you learn writing software to help you cheat on your math homework will be applied again and again throughout your career.

Knowing all this won't make finding determinants less tedious, but hopefully it won't seem like entirely random torture. Good luck!

Monday, September 22, 2014

Choosing not to have a choice

The highest expression of freedom is self-discipline in pursuit of a worthy goal.

The secret is to make it easy. The secret to making it easy is choosing not to have a choice.

When you have a choice, you can procrastinate. You can put off the exercise. You can have another cookie. Maybe just one more drink. It adds up. If these things are important to you--if you truly want to get work done, get in shape, lose weight, or kick the booze--choose not to have a choice.

You can procrastinate after the work's done.
You can contemplate the futility of working out after your workout.
Cookies are gasoline, not food.
Same with alcohol.

This is the nature of freedom for excellence. This is what it takes to live the life you meant to live. This is how you rise above your lot in life. This is how you get from what you need to accomplish to what you want to accomplish to what you were born to accomplish. You have no choice. Then it's easy.


Friday, September 19, 2014

Entropy by Another Alphabet -- Computing for Everyone

Previous: Entropy and Bits

All right, enough about passwords and entropy. This stuff is supposed to show up everywhere--how about another example?

Here are some Wheel of Fortune Google Image results with upper bounds on the corresponding entropy, for fun.

40.9 bits of entropy--luckily, that's a strict upper bound. I'm sure this guy's got something up his sleeve....
Good luck!
Oh no, an upper bound of 30.7 bits of entropy!
Luckily, as someone who was in an English-speaking country as a 5-year-old, you have the extra information needed to solve the puzzle.
An upper bound of 3.8 bits, but as an English speaker you've solved the puzzle
Only one possibility--exactly zero bits!
The bounds are based on any remaining letter being equally viable for each spot. The calculation uses knowledge of Wheel of Fortune's rules but no knowledge of English. With knowledge that these solutions are comprehensible in English, the entropy is significantly less--otherwise we'd never fit a show into 22 minutes!

40.9 bits of entropy was an upper bound, remember ;-)
The entropy of each puzzle is way less than each captioned upper bound because not all guesses are equally plausible. My previous post mentioned that when we use bits to measure entropy, we mean bits that have a 50/50 chance of being either 0 or 1. This equal probability is crucial.

This last one was an amazing performance by the contestant, but clearly my entropy upper bound calculations don't reflect the actual difficulty of the puzzle. Why? Because 41 bits of entropy would mean there were 241 equally plausible solutions to the puzzle. This guy's good, but not quadrillions-of-guesses good.

The upper-bound any-of-17-letters-in-each-of-the-10-spaces is clearly too large a universe of possibilities. The contestant, knowing English, could work out that the first word was probably "new," especially since missing "t" and "o" means it isn't "net" or "neo."

This leaves us with the number of 4-letter English words that can be made from the remaining 17 letters, the number of 5-letter English words from those same letters, and the triplets that make sense in combination (unfortunately for the contestant, you could have a "new" pretty much anything--"new baby buggy" is as plausible as new any-other-possible-phrase).

His situation is bad, but it's bad on the order of thousands of possibilities rather than quadrillions. Since the category is "thing," one of those words should be a noun, which whittles down the possibilities even further, etc. And lucky for our hero, the solution was near the beginning of the alphabet. An amazing performance.

But still, the second example--30.7 bits of entropy as an upper bound. Surely there aren't a trillion equally-plausible solutions.

The same thing happens in computing--in guessing passwords, in back-solving logic puzzles, and in communication.

If any remaining letter were equally probable on a Wheel of Fortune board, the game couldn't exist. Not only would it be tremendously boring, but equally-probable letters mean that there's nothing you can do to guess the solution other than guessing every word. Thankfully, English doesn't work like that. This feature of the language is called redundancy. Redundancy is a measure of how many symbols, on average, can be missing from a message while still remaining comprehensible. The "on average" is an important qualifier--as an exercise, think of some humorous situations where changing one letter in a word or one word in a phrase yields comprehensible English with an intent far different from what was originally intended! From our new perspective we can look at Wheel of Fortune as a game where contestants press their luck to select letters that give them enough information to solve a puzzle before their opponents can.

Redundancy has implications for computing. You've probably heard of file compression, i.e. zipping files. This is the essence of what's happening when you zip a file: the compressor is measuring the statistics of what's in the file and coming up with an encoding scheme whereby any bit in the compressed file has about a 50/50 chance of being 0 or 1. And this is lossless compression, meaning when you unzip the compressed file on the other side, you get back exactly what you had before compression.

Lossless compression of text files contrasts with lossy compression used in pictures, audio, and video. In that case, the uncompressed form isn't recoverable from the compressed form. We lose information, hence JPEG, MP3, and H.264 are all lossy. The trick with lossy compression of media files is to do so in a way where humans don't notice or care about the loss of fidelity. When encoding with lossy compression, the compression software often exposes settings for what bitrate you'd like to encode at to determine the threshold of what to smooth over. The rabbit hole of lossy compression schemes is extremely deep, but it all comes back to entropy and making the bits that are transmitted over the internet matter as bits you consume on the other end.

Wednesday, September 17, 2014

Entropy and Bits - Computing for Everyone

Previous: Entropy and Passwords

In information theory, entropy is measured in bits. For each bit of entropy in your set of rules for generating passwords, the number of possible passwords doubles. The famous correct horse battery staple comic does a great job of illustrating these bits as boxes:

Each square represents one bit of information. You know bits already as "binary digits"--something that can be either 1 or 0. This is still the case in information theory, but an important concept when talking about bits and entropy is that there is a 50/50 probability for each bit. The "caps?", "order unknown", and "common substitution" bits are self-explanatory in this context. But why 11 bits for the "uncommon (non-gibberish) base word", 4 bits for "punctuation", or 3 for "numeral"?

To answer this, we model each choice as if it were a random string of 1s and 0s and ask how long that string would have to be to have about as many possibilities as the number of possibilities that have an equal chance of selection (you may remember this from your Algebra 2 class as the base-2 logarithm).

Example: Why is a numeral 3 bits?
Solution: There are 10 possibilities for a base-10 numeral. Three bits yields 23 = 8 possibilities, which is closer to 10 than 24 = 16 possibilities, so we'll round to 3 bits. The exact number of bits would be lg 10 ≈ 3.32 bits (I find it convenient to abbreviate log2n as lg n). I believe it was the correct artistic choice for Mr. Munroe not to depict .32 of a bit.

Using similar reasoning, we see the comic estimates 16-32 punctuation symbols and in the ballpark of 2048 common words.

There's a password generator based on this comic online. Taking a look at its UI and word list, we can calculate the entropy of the default settings for the web site at about 48 bits--4 bits higher than the 44 bits advertised by the comic (3.32 of those come from appending a digit by default, which wasn't in the comic).

Next: Entropy by Another Alphabet

Monday, September 15, 2014

Entropy and Passwords - Computing for Everyone

Which is a better password (neither are good): "password" or "ospawdr"?
Intuitively, the second is better, because it's harder to guess--even though they both use the same letters, and the second collection of letters is shorter (7 letters instead of 8).

This is the information theory concept of entropy[1]. Once you're familiar with entropy, you'll see it pop up everywhere in computing and everywhere in life. Intuitively, a set of rules that generates passwords that are harder to guess has higher entropy than a set of rules that generates passwords that are easier to guess.

If your set of rules for generating passwords has high entropy, that means it will take more attempts to guess your password. High entropy is crucial because an attacker might be able to make guesses very, very, very quickly. Thankfully, you can use rules that can outrun the guess rate of an attacker. This is why some web sites have obnoxious rules about including a mix of characters that makes your passwords hard to remember: the rules make them hard to guess as well.

So why is "ospawdr" better than "password"? Let's look at the rules that generated each password.
"password" is a word in the dictionary. Worse, it's a common default password. As a common default password, a manual attacker might try it within the first half-dozen attempts (and laugh hysterically when it works). But let's be generous and say the rules you used to arrive at the password "password" are just that it's a common English word. How many English words are there? Let's say there are a million. But "password" is a common English word--a clever attacker would surely try common words first, no? The second edition of the Oxford English Dictionary contains under 175,000 entries--most of which you probably haven't heard of. But rest assured, a computer can make 175,000 guesses within seconds. So with a generous estimate, we'll say we're guaranteed to guess "password" within the first 200,000 guesses.

Now let's look at "ospawdr". My rules for making this password were to choose an arbitrary 7 letters from the word "password". "password" has 7 different letters, so my rules generate 77 = 823,543 different passwords. Much better!

There's a problem, though: my "arbitrary" 7 letters still don't have great entropy! Why? Because it turns out that humans are bad at making random-looking choices--what looks random to a human isn't as "random" as it should be! For example, I chose these characters in my head while looking at the word, and it turns out that I chose 7 distinct letters in a 7-character password from a 7-character alphabet. The fraction of truly random passwords that share this characteristic is 7 • 6 • 5 • 4 • 3 • 2 • 1 = 7! = 720 out of a space of 823,543. My "arbitrary" rules picked a password that was actually in a subset of 0.8% of the space I thought I was using. What a disaster![2]

Next: Entropy and Bits

[1] There's a related physics version of entropy.
[2] For comparison, here are 10 passwords I generated with a Ruby script using the rules I thought I was using:

dddrdrr
wpswoow
oaprrrr
drpaawo
sraddsd
warsosr
psdasso
rdpdwao
ooapwaw
apsppwo
And the script:
alphabet = 'pasword'
10.times do
  password = ''
  7.times do
    password << alphabet.chars.sample
  end
  puts password
end

These script-generated passwords don't "look" "as random," but they're actually harder to guess than the original. This is why I recommend using a password manager and generator.

Here are 10 sample "pasword" permutations:
posadwr
owpdsar
prowads
podwasr
sdawpro
opadrsw
rowsdpa
wsaodpr
rswdaop
drsoapw

And the Ruby code:
alphabet = 'pasword'
10.times do
  puts alphabet.chars.shuffle.join
end

As an example of how the human mind is bad at picking out "random" data, here are the two next to each other.

Random characters (entropy ≈ 20 bits)Random permutation (entropy ≈ 10 bits)
dddrdrr
wpswoow
oaprrrr
drpaawo
sraddsd
warsosr
psdasso
rdpdwao
ooapwaw
apsppwo
  
posadwr
owpdsar
prowads
podwasr
sdawpro
opadrsw
rowsdpa
wsaodpr
rswdaop
drsoapw
  
For more on passwords, see