ansaurus

Question

Ruby: How can I process a CSV file with "bad commas"?

Answer 1

+1 A:

Well, here's an idea: You could replace each instance of comma-followed-by-a-space with a unique character, then parse the CSV as usual, then go through the resulting rows and reverse the replace.

Jordan 2010-10-18 20:31:37

Answer 2

A:

Perhaps something along these lines..

using gsub to change the ', ' to something else

ruby-1.9.2-p0 > "foo,bar,baz,pop, blah,foobar".gsub(/,\ /,'| ').split(',')
[
    [0] "foo",
    [1] "bar",
    [2] "baz",
    [3] "pop| blah",
    [4] "foobar"
]

and then remove the | after words.

Doon 2010-10-18 20:34:45

Answer 3

A:

If you are so lucky as to only have one field like that, you can parse the leading fields off the start, the trailing fields off than end and assume whatever is left is the offending field. In python (no habla ruby) this would look something like:

fields = line.split(',') # doesn't work if some fields are quoted
fields = fields[:5] + [','.join(fields[5:-3])] + fields[-3:]

Whatever you do, you should be able at a minimum determine the number of offending commas and that should give you something (a sanity check if nothing else).

BCS 2010-10-18 23:43:20

Answer 4

+4 A:

you can use a negative lookahead

>> "foo,bar,baz,pop, blah,foobar".split(/,(?![ \t])/)
=> ["foo", "bar", "baz", "pop, blah", "foobar"]

ghostdog74 2010-10-18 23:52:11

+1 for using grouping in the split regex.

Greg 2010-10-19 07:43:35

ansaurus

tags:

views:

answers:

Ruby: How can I process a CSV file with "bad commas"?

related questions