tags:

views:

473

answers:

9
10
5
-1
-1
-1
1
1
0
2
...

If I want to count the number of occurrences of each number in a file, how do I use python to do it?

+2  A: 
  1. Use collections.defaultdict so that by deafult count for anything is zero
  2. After that loop thru lines in file using file.readline and convert each line to int
  3. increment counter for each value in your countDict
  4. at last go thru dict using for intV, count in countDict.iteritems() and print values
Anurag Uniyal
-1 Don't use readlines() when you don't need to; it reads the whole file into memory. If you have a need for SOME history, collections.deque may be what you want. Otherwise just do `for line in f:`
John Machin
actually in intended file.readline but yes for line in f: is better
Anurag Uniyal
+1  A: 

Use dictionary where every line is a key, and count is value. Increment count for every line, and if there is no dictionary entry for line initialize it with 1 in except clause -- this should work with older versions of Python.

def count_same_lines(fname):
    line_counts = {}
    for l in file(fname):
     l = l.rstrip()
     if l:
      try:
       line_counts[l] += 1
      except KeyError:
       line_counts[l] = 1
    print('cnt\ttxt')
    for k in line_counts.keys():
     print('%d\t%s' % (line_counts[k], k))
Michał Niklas
+2  A: 

Read the lines of the file into a list l, e.g.:

l = [int(line) for line in open('filename','r')]

Starting with a list of values l, you can create a dictionary d that gives you for each value in the list the number of occurrences like this:

>>> l = [10,5,-1,-1,-1,1,1,0,2]
>>> d = dict((x,l.count(x)) for x in l)
>>> d[1]
2

EDIT: as Matthew rightly points out, this is hardly optimal. Here is a version using defaultdict:

from collections import defaultdict
d = defaultdict(int)
for line in open('filename','r'):
    d[int(line)] += 1
stephan
This will execute a linear search of the array /every time/ you call count, which means O(n^2) behavior.
Matthew Flaschen
That's going to be very slow for a large list. count scans the list, so this is an O(n**2) algorithm, and also not usable with iterators, so you'd need to read the file into memory first, or re-read it each pass.
Brian
+2  A: 

I think what you call map is, in python, a dictionary.
Here is some useful link on how to use it: http://docs.python.org/tutorial/datastructures.html#dictionaries

For a good solution, see the answer from Stephan or Matthew - but take also some time to understand what that code does :-)

Roberto Liffredo
No, he's referring to the built-in function 'map': http://docs.python.org/library/functions.html#map
ozan
@ozan: doubtful, based on the question, and also because jfq is from a C++ background. (+1 for tutorial link)
Miles
A: 
l = [10,5,-1,-1,-1,1,1,0,2]
d = {}
for x in l:
  d[x] = (d[x] + 1) if (x in d) else 1

There will be a key in d for every distinct value in the original list, and the values of d will be the number of occurrences.

Matthew Flaschen
Except his numbers are in a file... You forgot that bit. ;)
musicfreak
Try using `collections.defaultdict(int)` for this, it gets even simpler.
S.Lott
True, musicfreak. Others have spelled that (and defaultdict) out already, though.
Matthew Flaschen
+7  A: 

This is almost the exact same algorithm described in Anurag Uniyal's answer, except using the file as an iterator instead of readline():

from collections import defaultdict
try:
  from io import StringIO # 2.6+, 3.x
except ImportError:
  from StringIO import StringIO # 2.5

data = defaultdict(int)

#with open("filename", "r") as f: # if a real file
with StringIO("10\n5\n-1\n-1\n-1\n1\n1\n0\n2") as f:
  for line in f:
    data[int(line)] += 1

for number, count in data.iteritems():
  print number, "was found", count, "times"
Roger Pate
Very clean and bullet proof.
e-satis
print(number, "was found", count, "times")
Selinap
@Selinap, that means something different in 2.x. This is 2.x code, the 3.x mentioned was just to point out that the io module is forward-compatible.
Roger Pate
+3  A: 

Counter is your best friend:)
http://docs.python.org/dev/library/collections.html#counter-objects

for(Python2.5 and 2.6) http://code.activestate.com/recipes/576611/

>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
...     cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})

for this :

print Counter(int(line.strip()) for line in open("foo.txt", "rb"))
##output
Counter({-1: 3, 1: 2, 0: 1, 2: 1, 5: 1, 10: 1})
sunqiang
You made me compile Python 2.7a0 to try this out. :)
Matthew Flaschen
It's in Python 3.1 (from collections import counter)
Tim Pietzcker
A: 

counter.py

#!/usr/bin/env python
import fileinput
from collections import defaultdict

frequencies = defaultdict(int)
for line in fileinput.input():
    frequencies[line.strip()] += 1

print frequencies

Example:

$ perl -E'say 1*(rand() < 0.5) for (1..100)' | python counter.py
defaultdict(<type 'int'>, {'1': 52, '0': 48})
J.F. Sebastian
+1  A: 

New in Python 3.1:

from collections import Counter
with open("filename","r") as lines:
    print(Counter(lines))
Tim Pietzcker