views:

384

answers:

2

I have a file in CSV format where the delimiter is the ASCII unit separator ^_ and the line terminator is the ASCII record separator ^^ (obviously, since these are nonprinting characters, I've just used one of the standard ways of writing them here). I've written plenty of code that reads and writes CSV files, so my issue isn't with Python's csv module per se. The problem is that the csv module doesn't support reading (but it does support writing) line terminators other than a carriage return or line feed, at least as of Python 2.6 where I just tested it. The documentation says that this is because it's hard coded, which I take to mean it's done in the C code that underlies the module, since I didn't see anything in the csv.py file that I could change.

Does anyone know a way around this limitation (patch, another CSV module, etc.)? I really need to read in a file where I can't use carriage returns or new lines as the line terminator because those characters will appear in some of the fields, and I'd like to avoid writing my own custom reader code if possible, even though that would be rather simple to meet my needs.

+2  A: 

Why not supply a custom iterable to the csv.reader function? Here is a naive implementation which reads the entire contents of the CSV file into memory at once (which may or may not be desirable, depending on the size of the file):

def records(path):
    with open(path) as f:
        contents = f.read()
        return (record for record in contents.split('^^'))

csv.reader(records('input.csv'))

I think that should work.

Will McCutchen
Yes, although that's the sort of thing I'm trying to avoid doing.
Gordon Worley
+3  A: 

This PEP has an example on how to make your custom CSV dialect, and the dialect has a line terminator property:

http://www.python.org/dev/peps/pep-0305/

So what you probably want to do is:

class MyDialect(csv.excel):
    lineterminator = "whatever"
csv.register_dialect("myDialect", MyDialect)

# now just use the reader with this dialect
Prody
This doesn't actually work, though. I tested it though, just to be sure, and the reader ignores whatever you put as the lineterminator and just uses \r or \n.
Gordon Worley
just tried it myself also, it worked. Here's what I did:http://pastie.org/707761
Prody
Right, but if you tried to set the `lineterminator` you'll discover that it fails. For example, if you change your linked example to have the lineterminator be "a", it won't do what you'd expect.
Gordon Worley