views:

88

answers:

5

I'm a bit rusty in Python and am just looking for help implementing an example function to count words (this is just a sample target for a scons script that doesn't do anything "real"):

def countWords(target, source, env):
  if (len(target) == 1 and len(source) == 1):
    fin = open(str(source[0]), 'r')
    # do something with "f.read()"
    fin.close()

    fout = open(str(target[0]), 'w')
    # fout.write(something)
    fout.close()
  return None

Could you help me fill in the details? The usual way to count words is to read each line, break up into words, and for each word in the line increment a counter in a dictionary; then for the output, sort the words by decreasing count.

edit: I'm using Python 2.6 (Python 2.6.5 to be exact)

+7  A: 
from collections import defaultdict

def countWords(target, source, env):
    words = defaultdict(int)
    if (len(target) == 1 and len(source) == 1):
        with open(str(source[0]), 'r') as fin:
            for line in fin:
                for word in line.split():
                    words[word] += 1

        with open(str(target[0]), 'w') as fout:
            for word in sorted(words, key=words.__getitem__, reverse=True):
                fout.write('%s\n' % word)
    return None
eumiro
that seems to work except for the reversed argument
Jason S
btw, what's the `with` being used for here? why not just `fin = open(str(source[0]), 'r')` instead?
Jason S
(thanks!!!!!!!!)
Jason S
@Jason - fixed `reverse` (was a typo, thanks!). `with` is a nice way to open/close a file within one block.
eumiro
oh, it's supposed to be an auto-close block? (like the RAII pattern in C++?)
Jason S
@Jason: yeah that's the goal i think. check [this out](http://effbot.org/zone/python-with-statement.htm)
Claudiu
A: 

There is a helpful example here. It works roughly as you describe and also counts sentences.

littlegreen
thanks, but I already found that one, and it doesn't do what I'm looking for (I'd like to keep a count of each word)
Jason S
+1  A: 

Without knowing why env exists, I can only do the following:

def countWords(target, source, env):
    wordCount = {}
    if len(target) == 1 and len(source) == 1:
        with fin as open(source[0], 'r'):
            for line in f
                for word in line.split():
                    if word in wordCount.keys():
                        wordCount[word] += 1
                    else:
                        wordCount[word] = 0

        rev = {}
        for v in wordCount.values():
            rev[v] = []
        for w in wordCount.keys():
            rev[wordCOunt[w]].append(w)
        with open(target[0], 'w') as f:
            for v in rev.keys():
                f.write("%d: %s\n" %(v, " ".join(rev[v])))
inspectorG4dget
A: 

Not too efficient but it is concise!

with open(fname) as f:
   res = {}
   for word in f.read().split():
       res[word] = res.get(word, 0)+1
with open(dest, 'w') as f:
    f.write("\n".join(sorted(res, key=lambda w: -res[w])))
Claudiu
A: 

Here my version:

import string
import itertools as it
drop = string.punctuation+string.digits

def countWords(target, source, env=''):
    inputstring=open(source).read()
    words = sorted(word.strip(drop)
                   for word in inputstring.lower().replace('--',' ').split())
    wordlist = sorted([(word, len(list(occurances)))
                      for word, occurances in it.groupby(words, lambda x: x)],
                        key = lambda x: x[1],
                      reverse = True)
    with open(target,'w') as results:
        results.write('\n'.join('%16s : %s' % word for word in wordlist))
Tony Veijalainen