heya,
I have a Excel .CSV file I'm attempting to read in with DictReader.
All seems to be well, except it seems to omit rows, specifically those with missing columns.
Our input looks like:
mail,givenName,sn,lorem,ipsum,dolor,telephoneNumber
[email protected],ian,bay,3424,8403,2535,+65(2)34523534545
[email protected],mike,gibson,3424,8403,2535,+65(2)34523534545
[email protected],ross,martin,,,,+65(2)34523534545
[email protected],david,connor,,,,+65(2)34523534545
[email protected],chris,call,3424,8403,2535,+65(2)34523534545
So some of the rows have missing lorem/ipsum/dolor columns, and it's just a string of commas for those.
We're reading it in with:
def read_gd_dump(input_file="blah 20100423.csv"):
gd_extract = csv.DictReader(open('blah 20100423.csv'), restval='missing', dialect='excel')
return dict([(row['something'], row) for row in gd_extract])
And I checked that "something" (the key for our dict) isn't one of the missing columns, I had originally suspected it might be that. It's one of the columns after that.
However, DictReader seems to completely skip over the rows. I tried setting restval to something, didn't seem to make any difference. I can't seem to find anything in Python's CSV docs (http://docs.python.org/library/csv.html) that would explain this behaviour, but I may have misread something.
Any ideas?
Thanks, Victor
EDIT:
It turns out I was pretty stupid - I was indexing the dict on a column ("something") that was empty for some rows in the input CSV file, a fact I didn't even notice in the mass of data (basically there were two ID columns, and I was using the wrong one).
Hence, Alex was right, there were duplicates in "something", and hence each subsequent entry with an empty "something" was overwriting the previous one.
I've awarded the answer to Alex Martelli.