views:

333

answers:

3

I have the following line in a CSV file that's giving me issues when parsing:

312,'997639',' 2','John, Doe. "J.D." ',' ','2000 ',' ','Street ','City ','NY','99999','','2010-02-17 19:12:04','2010-02-17 19:12:04';

I'm parsing with the following parameters:

FasterCSV.foreach(file, {:headers => true, :quote_char => '"', :col_sep => "','"} ) do |row|

However, it's blowing up on rows like the one above due to the "J.D" inside a row column. How do I properly parse that line with FasterCSV?

Thanks!

+1  A: 

It looks to me like your :quote_char should be ' and your :col_sep should be ,. In that case:

FasterCSV.foreach(file, {:headers => true, :quote_char => "'", :col_sep => ','} ) ...
Jordan
That results in a FasterCSV::MalformedCSVError (FasterCSV::MalformedCSVError) exception thrown.
mwilliams
How is your CSV being generated? Are you certain it's well-formed? :quote_char specifies the character that wraps around fields, which appears to be a single-quote in your example, and :col_sep specifies the character between fields, which appears to be a comma in your example. That is the information I based my answer on.
Jordan
The problem is that it's not well-formed and I was trying to bend FasterCSV to get it to parse it anyway. The dump was from a customer and I have since sent the proper SQL query that will output proper CSV. In the meantime I'm still trying to hack on it.
mwilliams
"The problem is that it's not well-formed and I was trying to bend FasterCSV to get it to parse it anyway.". You may want to cancel your downvotes on both answers, then.
Thilo
They're answers that were untested against a line of data I provided and they don't work. Therefore a downvote. If an answer doesn't work, I downvote, if it functions, I'll upvote.
mwilliams
A: 

You can't do that. FasterCSV only allows one choice of quote character, and your application needs two. There isn't a way to do cute stuff like pass in a regex instead of a character because FasterCSV precompiles matchers with the quote character escaped as follows:

# prebuild Regexps for faster parsing
esc_col_sep = Regexp.escape(@col_sep)
esc_row_sep = Regexp.escape(@row_sep)
esc_quote   = Regexp.escape(@quote_char)
@parsers = {
  :any_field      => Regexp.new( "[^#{esc_col_sep}]+",
                                 Regexp::MULTILINE,
                                 @encoding ),
  :quoted_field   => Regexp.new( "^#{esc_quote}(.*)#{esc_quote}$",
                                 Regexp::MULTILINE,
                                 @encoding ),
  ...
}
Grandpa
A: 

I haven't been able to bend FasterCSV to work the way I need it to with this data so the end result was simply requesting a new dump of the data with proper CSV output. Thanks for the attempts!

mwilliams