tags:

views:

266

answers:

3

Sometimes I need to parse string that is CSV, but I am having trouble whit quoted comas. As this code demonstrated. I am using python 2.4

import csv
for row in csv.reader(['one",f",two,three']):
    print row

i get 4 elements ['one"', 'f"', 'two', 'three'] but I would like to get this ['one", f"', 'two', 'three'] or 3 elements even if I try to use quotechar = '"' option (this is according to documentation default) still the same, how can I ignore coma in quotes?

Edit: Thank you all for answers obviously I mistaken my input for CSV, et the end I parsed strig for key values (NAME,DESCR...)

This is input

NAME: "2801 chassis", DESCR: "2801 chassis, Hw Serial#: xxxxxxx, Hw Revision: 6.0",PID: CISCO2801 , VID: V03 , SN: xxxxxxxxx

+1  A: 

You can get the csv module to tell you, just feed your desired output into the writer

In [1]: import sys,csv

In [2]: csv.writer(sys.stdout).writerow(['one", f"', 'two', 'three'])  
"one"", f""",two,three

In [3]: csv.reader(['"one"", f""",two,three']).next()  
Out[3]: ['one", f"', 'two', 'three']
gnibbler
when I try this with my real input I don't get desired outputthis is strigNAME: "2801 chassis", DESCR: "2801 chassis, Hw Serial#: xxxxxxx, Hw Revision: 6.0",PID: CISCO2801 , VID: V03 , SN: xxxxxxxxx
Ib33X
So actually your data is not CSV, but in some kind of dictionary format? A comma-separated list of key-value pairs?
Ferdinand Beyer
+6  A: 

Actually the result you get is correct—your CSV syntax is wrong.

If you want to quote commas or other characters in a CSV value, you have to use quotes surrounding the whole value, not parts of it. If a value does not start with the quote character, Python's CSV implementation does not assume the value is quoted.

So, instead of using

one",f",two,three

you should be using

"one,f",two,three
Ferdinand Beyer
unfortunately I don't have control on input string
Ib33X
Then I'm afraid you cannot use the `csv` module out of the box but have to write your own data reader.
Ferdinand Beyer
+1  A: 

Your input string is not really CSV. Instead your input contains the column name in each row. If your input looks like this:

NAME: "2801 chassis", DESCR: "2801 chassis, Hw Serial#: xxxxxxx, Hw Revision: 6.0",PID: CISCO2801 , VID: V03 , SN: xxxxxxxxx
NAME: "2802 wroomer", DESCR: "2802 wroomer, Hw Serial#: xxxxxxx, Hw Revision: 6.0",PID: CISCO2801 , VID: V03 , SN: xxxxxxxxx
NAME: "2803 foobars", DESCR: "2803 foobars, Hw Serial#: xxxxxxx, Hw Revision: 6.0",PID: CISCO2801 , VID: V03 , SN: xxxxxxxxx

The simplest you can do is probably to filter out the column names first, in the whole file. That would then give you a CSV file you can parse. But that assumes each line has the same columns in the same order.

However, if the data is not that consistent, you might want to parse it based on the names. Perhaps it looks like this:

NAME: "2801 chassis", PID: CISCO2801 , VID: V03 , SN: xxxxxxxxx, DESCR: "2801 chassis, Hw Serial#: xxxxxxx, Hw Revision: 6.0"
NAME: "2802 wroomer", DESCR: "2802 wroomer, Hw Serial#: xxxxxxx, Hw Revision: 6.0",PID: CISCO2801 , VID: V03 , SN: xxxxxxxxx
NAME: "2803 foobars",  VID: V03 ,PID: CISCO2801 ,SN: xxxxxxxxx

Or something. In that case I'd parse each line by looking for the first ':', split out the column head from that, then parse the value (including looking for quotes), and then continue with the rest of the line. Something like this (completely untested code):

def parseline(line):
    result = {}
    while ':' in line:
        column, rest = line.split(':',1)
        column = column.strip()
        rest = rest.strip()
        if rest[0] in ('"', '"'): # It's quoted.
            quotechar = rest[0]
            end = rest.find(quotechar, 1) # Find the end of the quote
            value = rest[1:end]
            end = rest.find(',', end) # Find the next comma
        else: #Not quoted, just find the next comma:
            end = rest.find(',', 1) # Find the end of the value
            value = rest[0:end]
        result[column] = value
        line = rest[end+1:]
        line.strip()
    return result
Lennart Regebro
Your function will fail since ':' can be part of the (quoted) value (see DESCR). It might be easier to use a regular expression here!
Ferdinand Beyer
It will not fail because of that, as it's treats quoted values separately. It never looks in the quoted value for a :
Lennart Regebro
But it would fail because I forgot the ",1" in the split, had [0, end] instead of [0:end] in one place, and return value instead of result. With those three changes it the works. Pretty good for code I didn't even try to run. :)
Lennart Regebro