tags:

views:

212

answers:

1

I have some code using libxml2's SAX2 interface. I want to be able to see < type entities as entity references and not as characters but it appears that no matter what I do, libxml2 will turn &lt; into a < and then hand it to my characters callback instead of calling my reference callback.

Any ideas as to how I can force libxml2 to call my reference callback for the basic predefined entities?

+1  A: 

You can't do this. LibXML is returning the string contents of the node, which is a literal <. If you want your string to contain &lt; then your original XML needs to contain &amp;lt;.

If you want to further escape this (which you should only do on output to another XML document), try calling a module like HTML::Entities to do the work.

Dominic Mitchell
But the libxml2 SAX docs state that you can choose to pass entity references unchanged in via your references callback. It seems strange that it would do this with some entities and not others.
Benno
XML treats some entities differently than others. The spec mentions that numeric character entities must be expanded immediately (http://www.xml.com/axml/target.html#sec-predefined-ent), but doesn't explicitly mention the predefined entities; probably a case of a bad semi-colon. See also http://www.xml.com/axml/target.html#sec-entexpand
kdgregory