views:

399

answers:

3

Hello,
"Programming Challenges (The Programming Contest Training Manual)" is probably one of the nicest exercises book on algorithms. I've resolved the first 11 exercises, but now I am stuck with "Crypt Kicker" problem:

Crypt Kicker
A common but insecure method of encrypting text is to permute the letters of the alphabet. In other words, each letter of the alphabet is consistently replaced in the text by some other letter. To ensure that the encryption is reversible, no two letters are replaced by the same letter.

Your task is to decrypt several encoded lines of text, assuming that each line uses a different set of replacements, and that all words in the decrypted text are from a dictionary of known words.

Input The input consists of a line containing an integer n, followed by n lowercase words, one per line, in alphabetical order. These n words compose the dictionary of words which may appear in the decrypted text. Following the dictionary are several lines of input. Each line is encrypted as described above.

There are no more than 1,000 words in the dictionary. No word exceeds 16 letters. The encrypted lines contain only lower case letters and spaces and do not exceed 80 characters in length.

Output Decrypt each line and print it to standard output. If there are multiple solutions, any one will do. If there is no solution, replace every letter of the alphabet by an asterisk.

Sample Input 6
and
dick
jane
puff
spot
yertle

bjvg xsb hxsn xsb qymm xsb rqat xsb pnetfn
xxxx yyy zzzz www yyyy aaa bbbb ccc dddddd

Sample Output
dick and jane and puff and spot and yertle
...

My question is what strategy should I to take in order to resolve this exercise. I was thinking to a classic and brutish backtracking solution, but I am trying avoid that until I find something more intelligent :).

PS: This is not homework related, I am just trying to improve my overall skills.

+2  A: 

A minor optimization could be done by enumerating possibilities before the backtracking run. In Python:

dictionary = ['and', 'dick', 'jane', 'puff', 'spot', 'yertle']
line = ['bjvg', 'xsb', 'hxsn', 'xsb', 'qymm', 'xsb', 'rqat', 'xsb', 'pnetfn']

# ------------------------------------

import collections

words_of_length = collections.defaultdict(list)

for word in dictionary:
  words_of_length[len(word)].append(word)

possibilities = collections.defaultdict(set)
certainities = {}

for word in line:
    length = len(word)
    for i, letter in enumerate(word):
        if len(words_of_length[length]) == 1:
            match = words_of_length[length][0]
            certainities[letter] = match[i]
        else:
            for match in words_of_length[length]:
              possibilities[letter].add(match[i])

for letter in certainities.itervalues():
    for k in possibilities:
        possibilities[k].discard(letter)

for i, j in certainities.iteritems():
    possibilities[i] = set([j])

# ------------------------------------

import pprint
pprint.pprint(dict(possibilities))

Output:

{'a': set(['c', 'f', 'o']),
 'b': set(['d']),
 'e': set(['r']),
 'f': set(['l']),
 'g': set(['f', 'k']),
 'h': set(['j', 'p', 's']),
 'j': set(['i', 'p', 'u']),
 'm': set(['c', 'f', 'k', 'o']),
 'n': set(['e']),
 'p': set(['y']),
 'q': set(['i', 'j', 'p', 's', 'u']),
 'r': set(['j', 'p', 's']),
 's': set(['n']),
 't': set(['t']),
 'v': set(['c', 'f', 'o']),
 'x': set(['a']),
 'y': set(['i', 'p', 'u'])}

If you have some single-element possibilities, you can eliminate them from the input and rerun the algorithm.

EDIT: Switched to set instead of list and added printing code.

Max Shawabkeh
Thank you, I'll take in consideration those optimizations!
Andrei Ciobanu
once the dictionary size grows this is going to be less useful no?
jk
jk, yes. For larger inputs you're probably better off doing a simple depth first search with nodes ordered by letter frequency as per Sylvestre's answer.
Max Shawabkeh
+1  A: 
KeyArray will hold the replacement table.

- Start with an empty KeyArray, this is version 0

- Match longest encrypted word to longest dictionary word and add to KeyArray 
  (if there are two longest, pick any), this is version 1.

- Decrypt some letters of the next longest crypted word. 
- Check if the decrypted letters match the letter in the same
  position in any dictionary word of the same length.
- If none matches, go back to version 0 and try another word.
- If some letters match, add the rest of the letters to KeyArray, this is version 2. 

- Decrypt some letters of the next longest crypted word. 
- Check if the decrypted letters match the letter in the same 
  position in any dictionary word. 
- If none matches, go back to version 1 and try another word
- If some letters match, add the rest of the letters to KeyArray, this is version 3. 

Repeat until all words are decrypted.

If at version 0 none of the longest words creates a partial decrypt in 
shorter words, very probably there is no solution.
Carlos Gutiérrez
Thank you, i will see this approach!
Andrei Ciobanu
+1  A: 

Another possible optimization, if you have "enough" text to deal with and you know the text's language, you can use letter frequencies (see : http://en.wikipedia.org/wiki/Letter_frequency). This is of course a very approximative approach when dealing with 6 / 7 words but will be the fastest way if you have a few pages to decode.

EDIT : about Max's solution, you could try to extract some characteristics of the word, too, such as repeating letters. Obviously, remarking that puff in the dictionary and qymm in the encrypted text are the only four letter words ending with a double letter gives a straight answer for 3 of the letters. In more complex scenarios, you should be able to narrow the possibilities for each letter couple.

Sylvestre Equy
And it's cool, because Sherlock Holmes used it in 'The Adventure of the Dancing Men' :) http://monpinillos.wordpress.com/2008/06/02/holmess-skills-cryptography/
Carlos Gutiérrez
Unfortunately I'll have to generate "enough" text. But that will be a cool problem in itself: "generating encrypted text". Thanks for your advice
Andrei Ciobanu