views:

102

answers:

5

Hi all. I have this line as an example from a CSV file:

2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"

I want to split it into an array. The immediate thought is to just split on commas, but some of the strings have commas in them, eg "Life and Living Processes, Life Processes", and these should stay as single elements in the array. Note also that there's two commas with nothing in between - i want to get these as empty strings.

In other words, the array i want to get is

[2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes","","",1,0,"endofline"]

I can think of hacky ways involving eval but i'm hoping someone can come up with a clean regex to do it...

cheers, max

+8  A: 

This is not a suitable task for regular expressions. You need a CSV parser, and Ruby has one built in:

http://ruby-doc.org/stdlib/libdoc/csv/rdoc/classes/CSV.html

And an arguably superior 3rd part library:

http://fastercsv.rubyforge.org/

meagar
I thought CSV could not cope with qualifiers?
willcodejavaforfood
FasterCSV is the default for Ruby 1.9.x, which allows you to specify a quote_char which might help in his case
willcodejavaforfood
What are "qualifiers"? This is a stock CSV line. No need to mess with quote_chars.
glenn mcdonald
+2  A: 

EDIT: I failed to read the Ruby tag. The good news is, the guide will explain the theory behind building this, even if the language specifics aren't right. Sorry.

Here is a fantastic guide to doing this:

http://knab.ws/blog/index.php?/archives/10-CSV-file-parser-and-writer-in-C-Part-2.html

and the csv writer is here:

http://knab.ws/blog/index.php?/archives/3-CSV-file-parser-and-writer-in-C-Part-1.html

These examples cover the case of having a quoted literal in a csv (which may or may not contain a comma).

Dave
+1  A: 

This morning I stumbled across a CSV Table Importer project for Ruby-on-Rails. Eventually you will find the code helpful:

Github TableImporter

poseid
+2  A: 
text=<<EOF
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
EOF
x=[]
text.chomp.split("\042").each_with_index do |y,i|
  i%2==0 ?  x<< y.split(",") : x<<y
end
print x.flatten

output

$ ruby test.rb
["2412", "21", "Which of the following is not found in all cells?", "Curriculum", "Life and Living Processes, Life Processes", "", "", "", "1", "0", "endofline"]
ghostdog74
+1  A: 
str=<<EOF
2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
EOF
require 'csv' # built in

p CSV.parse(str)
# That's it! However, empty fields appear as nil.
# Makes sense to me, but if you insist on empty strings then do something like:
parser = CSV.new(str)
parser.convert{|field| field.nil? ? "" : field}
p parser.readlines
steenslag
Thanks Steenslag, this is perfect. As it happens i don't mind the empty fields coming through as nil. Cheers, max
Max Williams