ansaurus

Question

How do I get the sum of all content when parsing an XML tag in Ruby?

Answer 1

+3 A:

With Nokogiri you can just ask for the text of a node. The issue I see when doing that though is that all of the whitespace and newlines that are in that node will be returned, so you might want to strip those out (likely a better way to do that than what I did for this example).

Here is a sample:

def test_nokogiri_text
  value = Nokogiri::HTML.parse(<<-HTML_END)
    "<h1>
      Hello<span class='punctuation'>,</span>
      <span class='noun'>World<span class='punctuation'>!</span>
     </h1>"
  HTML_END

  h1_node = value.search("h1").first
  assert_equal("Hello, World!", h1_node.text.split(/\s+/).join(' ').strip)
end

Aaron Hinni 2009-06-04 15:54:48

If I'm going to be turning all the newlines into spaces anyway (which is totally fine, since XML treats them as equivalent), then h1_note.text.gsub(/\s+/, ' ').strip works the same and is a little faster, since it doesn't need to create as many new objects.

James A. Rosen 2009-06-04 17:19:52

Answer 2

+1 A:

Nokogiri's Nokogiri::XML::Node#content will do it:

irb(main):020:0> node
=> <h1>
  Hello<span class="punctuation">,</span>
  <span class="noun">World<span class="punctuation">!</span>
</span>
</h1>
irb(main):021:0> node.content
=> "\n  Hello,\n  World!\n\n"

Pesto 2009-06-04 15:58:17

#text and #content are the same, so Aaron got the "Answer" b/c he also took care of the whitespace. +1, though :)

James A. Rosen 2009-06-04 17:16:44

Plus, he posted his answer first.

Pesto 2009-06-04 17:33:27

ansaurus

tags:

views:

answers:

How do I get the sum of all content when parsing an XML tag in Ruby?

related questions