tags:

views:

224

answers:

1

Hi all:

So i have a large(ish) file in a csv format, that contains a column that has html and i need to transform that to plain text (ie readable by people ,ie with no script tags)

I dont have much experience with ruby, but it seems like the perfect language to do this The File should still be in a cv format after the parsing ( ie, other columns should nto be disturbed) Helpz?

Fair enough, I thought there might be a library that does that as long as the html was valid. The file looks something liek this

  "xxxx-15454ss",   "xome name", "<div class=""myClass""><strong>The Vintage Junior </strong>offers the same specs as the Vintage Series but only in 3/4 Size ideal for Kids. the 57 Model is great value for a good quality guitar.  For more info go to <a href=""www.somehting.com"">something</a>
</div> "

I m trying to include the common html tags we would be using

Thanks

A: 

Have you tried to iterate through the csv file and run the following on each cell that contains html?

.gsub(/<\/?[^>]*>/, "")

adapted from: http://stackoverflow.com/questions/940774/how-do-you-change-headers-in-a-csv-file-with-fastercsv-then-save-the-new-headers

require 'fastercsv'

input = File.open 'original.csv', 'r'
output = File.open 'modified.csv', 'w'
FasterCSV.filter input, output, :headers => true, :write_headers => true  do |row|
  row.gsub(/<\/?[^>]*>/, "")
end
input.close
output.close
MatthewFord