tags:

views:

487

answers:

4

Apparently some csv output implementation somewhere truncates field separators from the right on the last row and only the last row in the file when the fields are null.

Example input csv, fields 'c' and 'd' are nullable:

a|b|c|d
1|2||
1|2|3|4
3|4||
2|3

In something like the script below, how can I tell whether I am on the last line so I know how to handle it appropriately?

import csv

reader = csv.reader(open('somefile.csv'), delimiter='|', quotechar=None)

header = reader.next()

for line_num, row in enumerate(reader):
    assert len(row) == len(header)
    ....
+3  A: 

Basically you only know you've run out after you've run out. So you could wrap the reader iterator, e.g. as follows:

def isLast(itr):
  old = itr.next()
  for new in itr:
    yield False, old
    old = new
  yield True, old

and change your code to:

for line_num, (is_last, row) in enumerate(isLast(reader)):
    if not is_last: assert len(row) == len(header)

etc.

Alex Martelli
A: 

Just extend the row to the length of the header:

for line_num, row in enumerate(reader):
    while len(row) < len(header):
        row.append('')
    ...
Serbaut
A: 

Could you not just catch the error when the csv reader reads the last line in a

try: ... do your stuff here... except: StopIteration

condition ?

See the following python code on stackoverflow for an example of how to use the try: catch: http://stackoverflow.com/questions/1202855/python-csv-dictreader-writer-issues

Alex Boschmans
won't tell you when you're on the last line, only will tell you after you've passed the last line.
ʞɔıu
I reread your question again, and you're right, that's not what you are asking - you want a way to deal with the last line. Why can't you use the solution by John Machin supplied below ?
Alex Boschmans
A: 

If you have an expectation of a fixed number of columns in each row, then you should be defensive against:

(1) ANY row being shorter -- e.g. a writer (SQL Server / Query Analyzer IIRC) may omit trailing NULLs at random; users may fiddle with the file using a text editor, including leaving blank lines.

(2) ANY row being longer -- e.g. commas not quoted properly.

You don't need any fancy tricks. Just an old-fashioned if-test in your row-reading loop:

for row in csv.reader(...):
    ncols = len(row)
    if ncols != expected_cols:
        appropriate_action()
John Machin
I agree but the source of this data refuses/is too incompetent to send me correctly formatted data. I have no choice but to handle its quirks myself.
ʞɔıu
Yes you have to handle its quirks yourself and I'm just pointing out that more quirks than "missing trailing null fields in last row" should be checked for in generality AND they can be checked simply without fancy code -- I don't understand your "but".
John Machin