ansaurus

Question

Using Python csv module on updating file

Answer 1

A:

You rarely need to catch StopIteration explicitly. Do this:

for row in csvReader:
    doStuff(row)

As for detecting when new lines are written to the file, you can ~~either popen a tail -f process or~~ write out the Python code for what tail -f does. (It isn't complicated; it basically just stats the file every second to see if it's changed. Here's the C source code of tail.)

EDIT: Disappointingly, popening tail -f doesn't work as I expected in Python 2.x. It seems iterating over the lines of a file is implemented using fread and a largeish buffer, even if the file is supposed to be unbuffered (like when subprocess.py creates the file, passing bufsize=0). But popening tail would be a mildly ugly hack anyway.

Jason Orendorff 2010-01-04 18:26:34

Answer 2

+2 A:

Producer-consumer stuff can get a bit tricky. How about using seek and reading bytes instead? What about using a named pipe?

Heck, why not communicate over a local socket?

Hamish Grubijan 2010-01-04 18:27:48

+1: named pipe. Give up on files. Use something simpler and built for this purpose.

S.Lott 2010-01-05 00:21:36

Answer 3

+2 A:

Your problem is not with the CSV reader, but with the file object itself. You may still have to do the crazy gyrations you're doing in your snippet above, but it would be better to create a file object wrapper or subclass that does it for you, and use that with your CSV reader. That keeps the complexity isolated from your csv processing code.

For instance (warning: untested code):

class ReopeningFile(object):
    def __init__(self, filename):
        self.filename = filename
        self.f = open(self.filename)

    def next(self):
        try:
            self.f.next()
        except StopIteration:
            depth = self.f.tell()
            self.f.close()
            self.f = open(self.filename)
            self.f.seek(depth)
            # May need to sleep here to allow more data to come in
            # Also may need a way to signal a real StopIteration
            self.next()

    def __iter__(self):
        return self

Then your main code becomes simpler, as it is freed from having to manage the file reopening (note that you also don't have to restart your csv_reader whenever the file restarts:

import csv
csv_reader = csv.reader(ReopeningFile('data.csv'))
for each in csv_reader:
    process_csv_line(each)

jcdyer 2010-01-04 18:36:58

ansaurus

tags:

views:

answers:

Using Python csv module on updating file

related questions