tags:

views:

3145

answers:

6

As an example, lets say I wanted to list the frequency of each letter of the alphabet in a string. What would be the easiest way to do it?

This is an example of what I'm thinking of... the question is how to make allTheLetters equal to said letters without something like allTheLetters = "abcdefg...xyz". In many other languages I could just do letter++ and increment my way through the alphabet, but thus far I haven't come across a way to do that in python.

def alphCount(text):
  lowerText = text.lower()
  for letter in allTheLetters:  
    print letter + ":", lowertext.count(letter)
+1  A: 

Something like this?

for letter in range(ord('a'), ord('z') + 1):
  print chr(letter) + ":", lowertext.count(chr(letter))

(I don't speak Python; please forgive my syntax errors)

Jacob
I think your "letter" inside the count() should be "chr(letter)"
paxdiablo
Since you fixed it (and didn't have my off-by-one bug resulting in only checking up to 'y' :-), I've deleted my answer and upvoted yours.
paxdiablo
This looks fine to me, why is it getting voted down ?
Adam Pierce
@Adam: I temporarily voted it down to remove it from the top position and elevate Matthew's answer. It's also not very Pythonic code.
John Millikin
@John: oooh, market manipulation. Does the SEC monitor these forums? :-)
paxdiablo
+6  A: 

the question is how to make allTheLetters equal to said letters without something like allTheLetters = "abcdefg...xyz"

That's actually provided by the string module, it's not like you have to manually type it yourself ;)

import string

allTheLetters = string.ascii_lowercase

def alphCount(text):
  lowerText = text.lower()
  for letter in allTheLetters:  
    print letter + ":", lowertext.count(letter)
Matthew Trevor
This solution is slow, since it has nested iterations (lowertext.count() iterates over the string in order to find the count)
Ber
However, the specific question was answered. Other problems are the original posters problem.
paxdiablo
+1  A: 

Do you mean using:

import string
string.ascii_lowercase

then,

counters = dict()
for letter in string.ascii_lowercase:
    counters[letter] = lowertext.count(letter)

All lowercase letters are accounted for, missing counters will have zero value.

using generators:

counters = 
    dict( (letter,lowertext.count(letter)) for letter in string.ascii_lowercase )
gimel
+4  A: 

If you just want to do a frequency count of a string, try this:

s = 'hi there'
f = {}

for c in s:
        f[c] = f.get(c, 0) + 1

print f
Adam Pierce
This is a very goot solution as it only iterates once over the given string, and thus is O(n) as opposed to using nested iterations.event better if you use f = defaultdict(int) and the simply f[c]+=1
Ber
Is the get member O(1)? If it's O(n), then the whole thing is O(n^2).
paxdiablo
@Pax Diablo: Mappings are hashed. Dictionary gets are O(1).
S.Lott
+1  A: 

Main question is "iterate through the alphabet":

import string
for c in string.lowercase:
    print c

How get letter frequencies with some efficiency and without counting non-letter characters:

import string

sample = "Hello there, this is a test!"
letter_freq = dict((c,0) for c in string.lowercase)

for c in [c for c in sample.lower() if c.isalpha()]:
    letter_freq[c] += 1

print letter_freq
mhawke
+13  A: 

The question you've asked (how to iterate through the alphabet) is not the same question as the problem you're trying to solve (how to count the frequency of letters in a string).

You can use string.lowercase, as other posters have suggested:

import string
allTheLetters = string.lowercase

To do things the way you're "used to", treating letters as numbers, you can use the "ord" and "chr" functions. There's absolutely no reason to ever do exactly this, but maybe it comes closer to what you're actually trying to figure out:

def getAllTheLetters(begin='a', end='z'):
    beginNum = ord(begin)
    endNum = ord(end)
    for number in xrange(beginNum, endNum+1):
        yield chr(number)

You can tell it does the right thing because this code prints True:

import string
print ''.join(getAllTheLetters()) == string.lowercase

But, to solve the problem you're actually trying to solve, you want to use a dictionary and collect the letters as you go:

from collections import defaultdict    
def letterOccurrances(string):
    frequencies = defaultdict(lambda: 0)
    for character in string:
        frequencies[character.lower()] += 1
    return frequencies

Use like so:

occs = letterOccurrances("Hello, world!")
print occs['l']
print occs['h']

This will print '3' and '1' respectively.

Note that this works for unicode as well:

# -*- coding: utf-8 -*-
occs = letterOccurrances(u"héĺĺó, ẃóŕĺd!")
print occs[u'l']
print occs[u'ĺ']

If you were to try the other approach on unicode (incrementing through every character) you'd be waiting a long time; there are millions of unicode characters.

To implement your original function (print the counts of each letter in alphabetical order) in terms of this:

def alphCount(text):
    for character, count in sorted(letterOccurrances(text).iteritems()):
        print "%s: %s" % (character, count)

alphCount("hello, world!")
Glyph
Excellent tutorial!
Ber
you really should use string.ascii_lowercase instead of writing your own getAllTheLetters. also, that is a horribly unpythonic name for a function!
hop
Your letterOccurrances() function will also count whitespace and punctuation, perhaps not intentionally.
mhawke
Actually the number of Unicode characters is still under a million. Also a few of them are non-alphabetic, so you want to exclude those when printing out frequencies.
Windows programmer
"string.ascii_lowercase" -- I hope there's a unicode_lowercase to handle Cyrillic, Greek, etc. I hope it knows how to downcase Turkish I's correctly depending on the current locale.
Windows programmer
Rather than collections.defaultdict(lambda: 0), using collections.defaultdict(int) will do the same thing, and is clearer IMO.
Tony Meyer
Nice solution Glyph, but I don't think your final solution solves the problem exactly the same way. In the original solution it prints "a: 0" if there were no "a"s in the original string. Yours would skip over "a" in this case, correct?
technomalogical