Thursday, December 30, 2010

Cracking a Playfair Cipher

When I was a kid I used to take code books out from the library. I'd spend days writing secret messages to myself and trying to get friends to play along but no one else is insane enough to spend too much time encrypting and decrypting inane messages. As part of my ongoing treasure hunt I now have an encrypted message to decrypt using one of those old ciphers I used to mess around with. It's a Playfair Cipher, where essentially you build a 5x5 matrix from a keyword followed by the rest of the alphabet in order. You take your plaintext that you want to encrypt, split it into pairs of letters, and then encrypt the letters using the matrix. (Note a 5x5 matrix has only 25 spaces so you need to drop a letter. Generally this is done by eliminating Q or combining I and J together.) In this way you have 600 different 'characters' (or digraphs) instead of just 26 with most normal ciphers. This really ramps up the complexity when it comes to decrypting it without the key. (A normal substitution cipher is easy enough it gets included in standard puzzle magazines as Cryptograms or Cryptoquotes.)

At any rate, I'm working on one of the chapters in The World's Greatest Treasure Hunt. Each chapter has 20 questions to answer and I've worked out 18 of them. The two that remain are "What is the password given in this chapter?" (the real crux of the treasure hunt is figuring out what in the world these passwords might be I think) and a Playfair Cipher where they've encrypted a trivia question about Pirates of the Caribbean. They didn't give me the key, unfortunately, though they did include the note "the password is the key". I'm operating under the assumption that the answer to the first question is therefore the key to the code in the second question. They have a website up where you can enter in answers and check if they're right (that's how I know I have the other 18) and there is a fixed number of blanks for each answer. So, I believe the password has 6 characters.

Now, the ciphertext of the trivia question is pretty short so a lot of common codebreaking tricks won't work. I can't check for commonly occuring digraphs, for example, since there are 600 possible digraphs and only 25 in the ciphertext. My first thought was I should just write a program to brute force it, but there are 244 million possible 6 letter keys with 25 letters in my alphabet. Assuming 2 seconds per test I'd be looking at more than 15 years of processing time. My guess for how long each test might take could be off, but I doubt it's off enough to make up that gap. I could narrow things down more by only using common words in the hope the key is a real word, but that's not going to save enough time.

So I got to thinking... This is a trivia question, right? Most questions start with one of 5 words: who, what, when, where, or why. All 5 of those words start with the same two letters. So, what if I restricted my matrixes to ones where the first digraph (Ye) decrypts to Wh. I messed around a bit on the bus this morning and established 19 possible ways to have a 6 letter key which results in Wh turning into Ye. (That there is only 1 letter between W and Y and 2 between E and H narrow things down a lot and also make it pretty plausible that this is the right decryption of Ye.) I'm going to quantify things a little more when I get a chance but I'm pretty sure this will drastically reduce my search space for the key.

On top of that, 4 of the letters in the ciphertext are capitalized. The first letter, of course, but then 3 more later on. Most likely these are refering to a name or place from the movie and I may be able to find more restrictions on my key if I work from the capital letters. A digraph actually repeats itself shortly after two of the capitals as well, so the capitalized words share letters. I have a hunch I need to flesh out more on the bus, but I think I may be able to lock down a couple more conversions. The second digraph of the whole ciphertext (which I suspect is o_, at, en, er, or y_ if the first one is really Wh) is reversed and found in one of the capitalized words which now that I'm looking at it more really gives me an idea. Time will tell!

1 comment:

Anonymous said...

I've been trying to crack that one as well.

Try a bacon cipher ;)