views:

240

answers:

4

I often need to process large text files containing headers in the first line. The headers are often treated differently to the body of the file, or my processing of the body is dependent on the headers. Either way I need to treat the first line as a special case. I could use simple line iteration and set a flag:

headerProcessed = false
for line in f:
    if headerProcessed:
        processBody(line)
    else:
        processHeader(line)
        headerProcessed = true

but I dislike a test in the loop that is redundant for all but one of the millions of times it executes. Is there a better way? Could I treat the first line differently then get the iteration to start on the second line? Should I be bothered?

+9  A: 

You could:

processHeader(f.readline())
for line in f:
    processBody(line)
Greg Hewgill
Thanks Greg. This is exactly what I'd expect to be able to do but had somehow assumed that the file position would be reset at the start of the for loop.Python lesson learnt: Expect the simple method to work!
shaftoe
Another lesson to learn: test your assumptions. You could have just tried it and found out, but then we all wouldn't have benefited from this interesting exchange!
Todd
The above works but in general you might get: `ValueError: Mixing iteration and read methods would lose data`.
J.F. Sebastian
+3  A: 

Use iter()

it_f = iter(f)
header = it_f.next()
processHeader(header)

for line in it_f:
    processBody(line)

It works with any iterable object.

Andrea Ambu
I think "iteratorable" is just adoratable.
Paul McGuire
Thanks for the correction :D
Andrea Ambu
+8  A: 
f = file("test")
processHeader(f.next()) #or next(f) for py3
for line in f:
    processBody(line)

This works.

Edit:

Changed .__next__ to next (they are equivalent, but I suppose next is more concise).

Regaring file vs open, file just seems more clear to me, therefore I will continue to prefer it over open.

David X
For Python 3, you say `next(f)`
kaizer.se
Note that `open()` is preferred to `file()`, here. `file` is best used as a type (`isinstance(…)`).
EOL
Py3 version should not be `f.__next__()` but `next(f)` - double-underscore methods should not be called directly.
Paul McGuire
`next(f)` works in Python 2.6+
J.F. Sebastian
I have python 2.5
David X
**open vs file**: see what Guido says on the subject in http://mail.python.org/pipermail/python-dev/2004-July/045921.html and in http://mail.python.org/pipermail/python-dev/2004-July/045967.html. It's probable that in the future you might have to change your code from `file` to `open`.
ΤΖΩΤΖΙΟΥ
Then i will change it then, and it will be less readable.
David X
+2  A: 

Large text files with headers in the first line? So it's tabular data.

Just to make sure: Have you looked at the csv module? It should handle all tabular data except such where the fields are not delimited but defined by position. And it does the header stuff too.

Lennart Regebro
Thanks Lennart, yes I frequently use the excellent csv module.
shaftoe