I have program that needs to parse XML that contains character entities. The program itself doesn't need to have them resolved, and the list of them is large and will change, so I want to avoid explicit support for these entities if I can.
Here's a simple example:
<?xml version="1.0" encoding="UTF-8"?>
<xml>Hello there &something;</xml>
Is there a Java XML API that can parse a document successfully without resolving (non-standard) character entities? Ideally it would translate them into a special event or object that could be handled specially, but I'd settle for an option that would silently suppress them.
Answer & Example:
Skaffman gave me the answer: use a StAX parser with IS_REPLACING_ENTITY_REFERENCES
set to false.
Here's the code I whipped up to try it out:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader = inputFactory.createXMLEventReader(
new FileInputStream("your file here"));
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isEntityReference()) {
EntityReference ref = (EntityReference) event;
System.out.println("Entity Reference: " + ref.getName());
}
}
For the above XML, it will print "Entity Reference: something
".