views:

497

answers:

3

Well I have been working on an assigment and it states:

A program has to be developed, and coded in C language, to decipher a document written in Italian that is encoded using a secret key. The secret key is obtained as random permutation of all the uppercase letters, lowercase letters, numbers and blank space. As an example, let us consider the following two strings:

Plain: “ABCDEFGHIJKLMNOPQRSTUVXWYZabcdefghijklmnopqrstuvwxyz0123456789 ”
Code:  “BZJ9y0KePWopxYkQlRjhzsaNTFAtM7H6S24fC5mcIgXbnLOq8Uid 3EDv1ruVGw”

The secret key modifies only letters, numbers, and spaces of the original document, while the remaining characters are left unchanged. The document is stored in a text file whose length is unknown.

The program has to read the document, find the secret key (which by definition is unknown; the above table is just an example and it is not the key used for preparing the sample files available on the web course) using a suitable decoding algorithm, and write the decoded document to a new text file.

And I know that I have to upload an English dictionary into the program but I don't why it has been asked (may be not in that statement but I have to do THAT). My question is, while I can do that program using simple encryption/decryption algorithm then what's the use of uploading the English dictionary in our program? So is there any decryption algorithm that uses a dictionary to decrypt an encrypted algorithm? Or can somebody tell me what approach or algorithm should I use to solve that problem???

An early reply (and also authentic one) will be highly appreciated from you.

Thank you guys.

+7  A: 

This is a simple substitution cipher. It can be broken using frequency analysis. The Wikipedia articles explain both concepts thoroughly. What you need to do is:

  • Find the statistical frequency of characters in Italian texts. If you can't find this published anywhere, you can build it yourself by analyzing a large corpus of Italian texts.
  • Analyze the frequency of characters in the cipher text, and match it to the statistical data.

The first Wikipedia article links to a set of tools that implement all of the above. You just need to use and possibly adapt it to your use case.

Ayman Hourieh
There is a column in my local newspaper that is one such chiper. As few as twelve words is breakable by hand in half an hour.
Joshua
+1  A: 

Your cipher is a substitution cipher. That is it substitutes one letter for another.

consider the cipher text

"yjr,1drv2ry1od1q1..."

We can use a dictionary to find the plaintext.

Find punctuation, since a space always follows a comma, you can find the substitution rule for spaces.

which gives you. "yjr, drv2ry od q..."

Notice the word lengths. Since there only two 1 letter words in the english language the q is probably i or a. "yjr" is probably "why", "the", "how" etc.

We try why with the result

"why, dyv2yw od q..."

There are no english words with two y's, and end in w.

So we try "the" and get

"the, dev2et od q..."

We conclude that the is a likely answer.

Now we search our dictionary for words that start look like ?e??et.

rinse repeat.

That is, find some set of words which fit into the lengths available and do not break each others substitution rules.

Personally I just do the frequency analysis suggested above.

e5
A: 

Frequency analysis, as both other respondents said, is the way to go, and you can use digrams and trigrams to make it much stronger. Just grab tons of Italian text from the web and churn ahead! It's really pretty simple programming.

Alex Martelli