views:

90

answers:

3

I am trying to count a list of say, integers. I have a list of numbers in a csv file I am able to read in, that looks something like 4,245,34,99,340,... What I am doing is trying to return is a dictionary with key:value pairs where the key is an integer value from the csv file, and the value is the number of times it appears in the list. I'm not sure what I am doing wrong here, any help would be appreciated

allCounts = dict()

rows = csv.reader(open('...csv'), delimiter=',')

    for intValue in rows:
        intVal = intValue[0]

        for intVal, numAppearances in allCounts:
             if intVal in allCounts:
                allCounts[numAppearances] = allCounts[numAppearances]+1
             else:
                allCounts[numAppearances] = 1
+7  A: 

Sounds like what you want is a Counter object:
http://docs.python.org/library/collections.html#counter-objects

Also I think you may want to use the CSV module:
http://docs.python.org/library/csv.html

Using the built-in modules should make it a lot easier :)

To get the rows something like this should work:

csvfile = open("example.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)

Then you should be able to do this:

c = Counter(reader)
shookster
I wish I could +1 you twice.
xtofl
+5  A: 

What you're doing is iterating through the entire dict for every cell, which is kind of weird and probably not what you want to do. What you really want to do is just look in the dict and increment the key in question. So:

# first part stays mostly the same
rows = csv.reader(open("...csv") )

allCounts = {} 

for row in rows:
    for field in row:
        allCounts[field] = allCounts.get(field, 0) + 1

That last line uses a nice little feature of dict, which returns a default value if the key isn't found.

In your own code, there are some noteworth defects. The most significant one is the fourth and fifth lines. you extract the first field from the selected row and assign it to intVal but then you completely mask intVal by using it as the key when iterating over your dict. what that means is that assignment did no work at all.

The if clause is doomed. You are checking to see if a key is in a dict, but you came up with that key by iterating over the keys from the same dict. Of course that key is in the dict.

The next issue is that your else clause is modifying a collection over which you are iterating. Python makes no guarantees about how this will work for dicts, so don't do it

For that matter, there's no reason at all to be iterating over the dict. You can just grab whichever key-value pair you are interested in directly. What you should be iterating over is the list of integers from the file.

A CSV file is always structured as a list of values (normally separated by commas) that form rows, and the rows are separated by newlines. the CSV module preserves this view, by returning a list of lists. To drill down to the actual values, you need to iterate over each row, and then each field in that row. Your code iterates over each row, and then each key in the dict for each row, ignoring the fields.

TokenMacGuy
You might want to use a `defaultdict` for this, it's slightly simpler looking.
S.Lott
@S.Lott: true, but then i'd have to introduce a new collection, and i'd rather keep the number of modules being used small as long as that's convenient.
TokenMacGuy
@TokenMacGuy: Can't buy that as terribly sensible. This is Python. Almost everything involves extensive use of the libraries. I think it's misleading to provide a `dict.get(this,0)` when a slightly simpler thing is available. Especially since the additional information might help this particular person understand even more about how Python can solve problems like this.
S.Lott
For that matter, collections.Counter is perhaps a tiny bit simpler than collections.defaultdict for this particular use case.
Marius Gedminas
A: 

Get rid of intVal = intValue[0]

Since intValue is a string, you'll be the first character in the string representation of th e number. What you really want is intValue = int(intValue).

Then you've got your logic all wrong - currently allCounts is initialized to an empty dictionary which you cannot iterate over. What you want to do is iterate over the values returned by the csv.reader, which you already are. From there your logic is close -- unfortunately this is neither horseshoes nor hand grenades. What you want is this:

# Checks to see if intValue is a key in the dictionary
if intValue in allCounts:
    # If it is then we want to increment the current value
    # += 1 is the idiomatic way to do this
    allCounts[intValue] += 1
else:
    # If it is not a key, then make it a key with a value of 1
    allCounts[intValue] = 1
Wayne Werner
In his code, intValue is not an integer. It is actually a list of strings, as returned by `csv.reader.next`
TokenMacGuy
ah.. fixed. That's what happens when you spend too much time looking at an IPython prompt, those single quotes just disappear!
Wayne Werner