views:

322

answers:

1

Hey folks,

I am stuck with something quite simple but really annoying: I have an xml file with one node, where the content includes line breaks and whitspaces. Sadly I can't change the xml.

<?xml version="1.0" encoding="utf-8" ?>
<ProductFeed>

ACME Ltd. Fooproduct Foo Root :: Bar Category

I get to the node and can read from it without trouble:

url = "http://feeds.somefeed/feed.xml.gz"
@source = open((url), :http_basic_authentication=>["USER", "PW"])
@gz = Zlib::GzipReader.new(@source)
@result = @gz.read
@doc = Nokogiri::XML(@result)
@doc.xpath("/ProductFeed/Vendors/Vendor").each do |manuf|
vendor = manuf.css("Name").first.text
manuf.xpath("//child::Product").each do |product|   

  product_name = product.css("Name").text
  foocat = product.css("Category").text

  puts "#{vendor} ---- #{product_name} ---- #{foocat} "
 end
end

This results in:

ACME Ltd. ---- Fooproduct ----
                                      Foo Root :: Bar Category

Obviously there are line breaks and tab stops or spaces in the string returned by product.css("Category").text.

Does anyone know how to strip the result from linebreaks and taps or spaces right here?

Alternatively I could do that in the next step, where I do a find on 'foocat' like

barcat = Category.find_by_foocat(foocat)

Thanks for helping!

Val

A: 

You could use XSLT to remove all the unnecessary characters.

santiiiii
Hi Santiiii,thanks for the idea! I hadn't used any xslt before and it looks great and worked for testing. For completeness: I actually ended up calling '.text).strip' on the desired node. cat = (product.css("Category").text).stripThe reason is that my specific setup with nokogiri made it easier that way.
val_to_many
I'm glad it worked. Regards
santiiiii