views:

1743

answers:

2

How do I parse CSV files with escaped newlines in Ruby? I don't see anything obvious in CSV or FasterCSV.

Here is some example input:

"foo", "bar"
"rah", "baz \
and stuff"
"green", "red"

In Python, I would do this:

csvFile = "foo.csv"
csv.register_dialect('blah', escapechar='\\')
csvReader = csv.reader(open(csvFile), "blah")
+1  A: 

I'm no Ruby expert, so forgive any errors along those lines.

You can't do this in a single step with a regex (that I know of). The Python code above is not a regular expression, so don't expect to use a regex that does the same thing.

You could do it in two steps using Perl compatable regex, but when I tried it on the version of Ruby I have installed, Ruby complained. You would generally read the whole file in, then split() the file using a negative lookbehind, then split() each element returned from that on a ,.

For example:

$allLines=$wholeFile.split(/(?<!\\)\n/m);

But Ruby complains that the (?<! sequence isn't recognized. So you're probably going to have to resort to some other method. I would recommend a library specifically designed for parsing a CSV e.g.: http://snippets.aktagon.com/snippets/246-How-to-parse-CSV-data-with-Ruby

Greg Miller
+4  A: 

If the fields that include a newline are properly quoted (as in your example data), then Ruby's csv parser can handle them just fine. However, if what you want is for Ruby to remove the escape character (as Python seems able to do by setting escapechar), then I also don't see a method for that in the Ruby docs. (By the way, as of Ruby 1.9 FasterCSV is Ruby's default csv implementation.)

#!/usr/bin/env ruby -w
require 'csv'

CSV.foreach('test.csv') do |rec|
  puts "Record: #{rec}"
end

Output:

telemachus ~ $ ruby read.rb 
Record: ["foo", "bar"]
Record: ["rah", "baz \\\nand stuff"]
Record: ["green", "red"]
Telemachus
+1. Same goes for FasterCSV.each as well.
Swanand
Since I'm using Ruby 1.9.1, that _is_ FasterCSV.each. FasterCSV is now the default csv implementation in Ruby.
Telemachus