views:

854

answers:

2

I am trying to make a tool that finds the frequencies of letters in some type of cipher text. Lets suppose it is all lowercase a-z no numbers. The encoded message is in a txt file

I am trying to build a script to help in cracking of substitution or possibly transposition ciphers.

Code so far

cipher = open('cipher.txt','U').read()
cipherfilter = cipher.lower()
cipherletters = list(cipherfilter)

alpha = list('abcdefghijklmnopqrstuvwxyz')
occurrences = {} 
for letter in alpha:
    occurrences[letter] = cipherfilter.count(letter)
for letter in occurrences:
    print letter, occurrences[letter]

All it does so far is show how many times a letter appears.. How would I print the frequency of all letters found in this file. thanks

+9  A: 
>>> import collections
>>> d = collections.defaultdict(int)
>>> for c in 'test':
...   d[c] += 1
...
>>> d
defaultdict(<type 'int'>, {'s': 1, 'e': 1, 't': 2})

From a file,

>>> myfile = open('test.txt','r')
>>> for line in myfile:
...   line = line.rstrip('\n')
...   for c in line:
...     d[c] += 1

For the genius that is the defaultdict container, we must give thanks and praise.
Otherwise we'd all be doing something silly like this:

s = "andnowforsomethingcompletelydifferent"
dict = {}
for letter in s:
 if letter not in dict:
  dict[letter] = 1
 else:
  dict[letter] += 1
Adam Bernier
defaultdict... very nice!
Cipher
+2  A: 

If you want to know the relative frequency of a letter c, you would have to divide number of occurrences of c by the length of the input.

For instance, taking Adam's example:

s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37

and storing the absolute frequence of each letter in

dict[letter]

we obtain the relative frequencies by:

from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
    print c, dict[c]/float(n)

putting it all together, we get something like this:

# get input
s = "andnowforsomethingcompletelydifferent"
n = len(s) # n = 37

# get absolute frequencies of letters
import collections
dict = collections.defaultdict(int)
for c in s:
    dict[c] += 1

# print relative frequencies
from string import ascii_lowercase # this is "a...z"
for c in ascii_lowercase:
    print c, dict[c]/float(n)
jacob