views:

44

answers:

1

input: Crypted English normal text (A-Z) using a random generated substitution cipher.

output: key

ideas: read the whole text storing in some arrays the frequencies for each character/bigram/trigram and comparing them to:
http://en.wikipedia.org/wiki/Letter_frequencies
http://en.wikipedia.org/wiki/Bigram
http://en.wikipedia.org/wiki/Trigram

cons: letters/bigrams/trigrams with close percentage (like "c" and "u")

my software should be able to guess the max. possible characters from the crypted text (minimum 2000 characters).
I have to guess at least 18-20 letters.

questions:
is there a way/known algorithm to guess all the characters => full key ?
or can you give me some useful references or advices on how I could improve the whole guessing process ?

Thank you in advance.

+1  A: 

I think you're on the right track. The only way you could recover the full key would be if the all characters (or all-1) are present in the plain text.

I'd be thinking along the lines of making some statistical guesses and then statictically checking the results for the plaintext Bigrams/Trigrams which result. Or checking whole words (if you know where the word boundaries are) against a word list.

Andrew Cooper
That's a problem.. I have only letters from A to Z without spaces.. and the last letter from 1 word + first letter from the next word would form a digram in my statistics.. I was wondering if the frequencies found on wikipedia would work for me.
Victor Z.
Andrew Cooper