views:

96

answers:

4

I have been trying to solve this question for a while, and I am having trouble because some of it seems somewhat more ambiguous than previous problems. As background, the problem says that the given text file is encrypted text with the ASCII saved as numbers. The encryption method is to XOR 3 lowercase letters cyclically with the plaintext (so it is reversible). The problem asks for the key that decrypts the file to English text. How should I restrict the character set of my output to get the answer without trying to sift through all possible plaintexts (26^3)?

I have tried restricting to letters, spaces, and punctuation, and that did not work.

To clarify: I want to determine, out of all printable ASCII characters, which ones I can probably discard and which ones I can expect to be in the plaintext string.

+1  A: 

Have you tried two of the most "basic" and common tools in analyzing the algorithm used?

  1. Analyze the frequency of the characters and try to match it against English letter frequency
  2. Bruteforce using keys from a wordlist, most often common words are used as keys by "dumb" users

To analyze the frequency for this particular problem you would have to split the string every third element since the key is of length 3, you should now be able to produce three columns:

79  59  12
2   79  35
8   28  20
2   3   68
...

you have to analyse the frequency for each column, since now they are independent of the key.

Ok, actually took my time and constructed the 3 complete columns and counted the frequency for each of the columns and got the two most frequent item or each column:

Col1  Col2  Col3
71    79    68
2     1     1

Now if you check for instance: http://en.wikipedia.org/wiki/Letter_frequency You have the most frequent letters, and don't forget you have spaces and other characters which is not present on that page, but I think you can assume that space is the most frequent character.

So now it is just a matter of xor:ing the most frequent characters in the table I provided with the most frequent characters in English language, and see if you get any lowercase characters, I found a three letter word which I think is the answer with only this data.

Good luck and by the way, it was a nice problem!

Daniel Persson
I did not think of the frequency analysis, and it might help, but I am not sure if the text is long enough for the results to be significant (it's about the size of a small paragraph). I will definately try it though.The wordlist seems like it would be a good idea, but knowing Project Euler, they probably picked a random key.
murgatroid99
Yeah I would not count on the wordlist getting you anywhere
JGord
One should never "not count" on anything, specially when it is rather easy to try all three letter words from a wordlist, there are some common passwords such as dog, god etc. who knows
Daniel Persson
On the contrary one should always "not count" on things. Sure try them as options, but I would not bank the key being a 3 letter word.Wasn't saying you couldn't try it...
JGord
Thanks. I got the answer using frequency analysis.
murgatroid99
+1  A: 

I'll admit upfront I'm not familiar with an XOR cipher.

However, it seems very similar to the concept of the vigenere cipher. Escpecially in the line where they mention for unbreakable encryption the keylength equals the message length. That screams Vernam Cipher.

As mentioned in the other answer, the strategical approach to breaking a vigenere cipher involves a probabilistic approach. I will not go into detail because most of the theory I learned was relatively complicated, but it can be found here keeping in mind that vignere is a series of caesar ciphers.

The problem makes it easy for you though because you already know the keylength. Because of that, as you mentioned, you can simply bruteforce by trying every single 3 letter combination.

Here's what I would do: take a reasonably sized chunk of the ciphertext, say maybe 10-20 characters, and try the brute force approach on that. Keep track of all the keys that seem to create understandable sequences of letters and then use those on the whole ciphertext. That way we can employ the obvious brute forcing method, but without bruteforcing the entire problem, so I don't think you'll have to worry about limiting your output.

That said, I agree that as you're creating the output, if you ever get a non printable character, you could probably break your loop and move on to the next key. I wouldn't try anything more specific than that because who knows what the original message could have, never make assumptions about the data you're dealing with. Short circuiting logic like that is always a good idea, especially when implementing a brute force solution.

JGord
I like your idea, but it still seems to come with the same problem I had before: I have over 17000 outputs, and I need the computer to discard the unintelligible outputs somehow.
murgatroid99
You have 17,000 outputs after restricting your output to only printable english characters? Yikes.Alright, how about downloading a dictionary file from the intertubes and using it on the first 20 characters of output you get. So once you have your ~15 character decryptions, check to see if they contain an english word. Better yet, if you aren't worried about speed, do substrings on each 20 character output and check to see if the beginning is an english word. IE, if the first letter forms a word? No, if the first 2 form a word? No, check first 3 etc.
JGord
I am sorry, it appears that I may not have made my original problem completely clear. I will edit my question.
murgatroid99
The main problem is that simply eliminating non-printable characters is not enough to sufficiently narrow down the solution (most ascii characters are printable), so the problem is to determine what other characters are not wanted in english text.
murgatroid99
A: 

Split the ciphertext into 3.

Ciphertext1 comprises the 1st, 4th, 7th, 10th...numbers Ciphertext2 comprises the 2nd, 5th, 8th, 11th...numbers Ciphertext3 comprises the 3rd, 6th, 9th, 12th...numbers

Now you know that each cyphertext is encrypted with the same key letter. Now do a standard frequency analysis on it. That should give you enough clues as to what the letter is.

Visage
+1  A: 

A possible solution is to simply assume the presence of a given three-character sequence in the encrypted text. You can use a three-letter word, or a three letter sequence which is likely to appear in English text (e.g. " a ": the letter 'a' enclosed between two spaces). Then simply try all possible positions of that sequence in the encrypted text. Each position allows you to simply recompute the key, then decrypt the whole text into a file.

Since the original text has length 1201, you get 1199 files to skim through. At that point it is only a matter of patience, but you can make it much faster by using a simple text search utility on another frequent sequence in English (e.g. "are"), for instance with the Unix tool grep.

I did just that, and got the decrypted text in less than five minutes.

Thomas Pornin