views:

569

answers:

1

Hi everybody,

it seems that all entities are killed using

tags = "<p>test umlauts &ouml;</p>"
Nokogiri::XML.fragment(tags)

Result:

<p>test umlauts </p>

The above method calls Nokogiri::XML::DocumentFragment.parse(tags) and that methods calls Nokogiri::XML::DocumentFragment.new(XML::Document.new, tags).

In relation to the nokogiri documentation this code will be executed:

def initialize document, tags=nil
    if tags
      parser = if self.kind_of?(Nokogiri::HTML::DocumentFragment)
                 HTML::SAX::Parser.new(FragmentHandler.new(self, tags))
               else
                 XML::SAX::Parser.new(FragmentHandler.new(self, tags))
               end
      parser.parse(tags)
    end
end

I think we are dealing with the XML::SAX::Parser and the corresponding FragmentHandler. Digging around the code gives no hint; which parameters do I have to set to get the correct result?

+2  A: 

oouml is not a predefined entity in XML. If you want to allow the HTML entity references in XHTML you'd need to use a parser that read the external DTD in the doctype. This is a lot of effort; you may prefer to just use the HTML parser if you have HTML-compatible XHTML with entity references.

bobince
Nokogiri::HTML.fragment("<p>test umlauts </p>") => <p>test umlauts </p>. This is not, what I want. You are right - to read the external DTD is a lot effort....
crazyrails
hang on... ‘oouml’?? That's not a defined entity even in HTML! Shouldn't that be ‘ouml’?
bobince
argh - you are definitive right!
crazyrails
Nokogiri::HTML.fragment("<p>test umlauts ö</p>") works! Thank you!
crazyrails