tags:

views:

903

answers:

3

I have found a CSV parsing issue with FasterCSV (1.5.0) which seems like a genuine bug, but which I'm hoping there's a workaround for.

Basically, adding a space after the separator (in my case a comma) when the fields are enclosed in quotes generates a MalformedCSVError.

Here's a simple example:

# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]

# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]

# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.

Am I going mad, or is this a bug in FasterCSV?

+1  A: 

Maybe you could set the :col_sep: option to ', ' to make it parse files like that.

Robert Massa
+3  A: 

The MalformedCSVError is correct here.

Leading/trailing spaces in CSV format are not ignored, they are considered part of a field. So this means you have started a field with a space, and then included unescaped double quotes in that field, which would cause the illegal quoting error.

Maybe this library is just more strict than others you have used.

Ben James
Isn't the space saying that the field is actually not surrounded by quotes (since the first char is not a quote) and that quotes should be taken as part of the field content?
Vincent Robert
Looks like I'm wrong. "If fields are not enclosed with double quotes, then double quotes may not appear inside the fields." -- http://tools.ietf.org/html/rfc4180#section-2
Vincent Robert
You're right, I didn't realise there was a 'spec' for CSV but it seems that there is. FasterCSV is indeed just very strict.
Olly
A: 

I had hoped that the :col_sep option might allow a regular expression, but it seems to be used for both reading and writing, which is a shame. The documentation doesn't hold out much hope and your need is probably more immediate than could be satisfied by requesting a change or submitting a patch ;-)

If you're calling #parse_line explicitly, then you could always call

gsub(/,\s*/, ',')

on your input line. That regular expression might need to change significantly if you anticipate the possibility of comma-space within quoted strings. (I'd suggest reposting such a question here with a suitable tag and let the RegEx mavens loose on it should that be the case).

Mike Woodhouse