views:

353

answers:

2

How can I use Nokogiri with having html entities (like German umlauts) untouched?

I.e.:

# this is fine
node = Nokogiri::HTML.fragment('<p>&ouml;</p>')
node.to_s # => '<p>&ouml;</p>'

# this is not
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>&ouml;</p>'

# this is what I need
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>ö</p>'

I've tried to mess with both PARSE_OPTIONS and :save_with options but could not come up with a way to have Nokogiri just transparently behave like above.

Any pointers?

A: 

Already answer in http://stackoverflow.com/questions/2524305/contents-of-a-node-in-nokogiri

shingara
Thanks for the answer, but sorry, it's not. Obviously I could un/escape entities in the client code, but I don't want to do that. I don't want my library to mess with the original html source. Instead I want it to return encoded stuff if the original html source contained it and return unencoded stuff if the original html source did not contain it.Maybe it helps to show the context of the question: http://gist.github.com/353048
svenfuchs
+1  A: 

Ok, my question has been answered by Aaron via twitter/gist: http://twitter.com/tenderlove/status/11489447561

svenfuchs