views:

35

answers:

1

How can I force a SAX parser (specifically, Xerces in Java) to use a DTD when parsing a document without having any doctype in the input document? Is this even possible?

Here are some more details of my scenario:

We have a bunch of XML documents that conform to the same DTD that are generated by multiple different systems (none of which I can change). Some of these systems add a doctype to their output documents, others do not. Some use named character entities, some do not. Some use named character entities without declaring a doctype. I know that's not kosher, but it's what I have to work with.

I'm working on system that needs to parse these files in Java. Currently, it's handling the above cases by first reading in the XML document as a stream, attempting to detect if it has a doctype defined, and adding a doctype declaration if one isn't already present. The problem is that this code is buggy, and I'd like to replace it with something cleaner.

The files are large, so I can't use DOM.

+1  A: 

Years ago I did something similar to this Can I force a parser to use fixed DTD for validation but I don't know whether these APIs still work like that.

stacker
Can you post the solution that worked for you? I see one that looks like it would work, but it uses DOM. I can't use DOM because these documents are large.
cosmic.osmo