tags:

views:

61

answers:

1

Hi all,

This is a continuation of a previous question. I'm having problems with this Nokogiri snippet:

>> require 'nokogiri'
>> html = 'bad<p>markup</p>with<img src="foo.jpg">'
>> Nokogiri::HTML(html).at_css('body').children.map {|x| '<p>' + x.text + '</p>'}.join('') 
=> "<p>bad</p><p>markup</p><p>with</p><p></p>"

What happened to my image tag? It seems that Nokogiri might be stripping ALL the HTML tags present (including my original <p> around the word "markup"), and replacing them. How do I prevent this from happening? All I want to do is ensure that entirely untagged text is wrapped in a <p> tag...

+2  A: 

Only wrap the element in a p tag if it is a text node, otherwise call to_html on it:

require 'nokogiri'

html = 'bad<p>markup</p>with<img src="foo.jpg">'

Nokogiri::HTML(html).at_css('body').
children.map do |x|
  if x.text?
    '<p>' + x.text + '</p>'
  else
    x.to_html
  end
end.join('') 
#=> "<p>bad</p>\n<p>markup</p><p>with</p><img src=\"foo.jpg\">"
Adrian
Ah, okay that makes sense. Thank you! :)
Aaron B. Russell