tags:

views:

412

answers:

3

Here's my problem:

My program is getting XML files as its input. These files may or may not have an xml declaration, doctype declaration, or entity declaration, but they all conform to the same schema. When my program gets a new file, it needs to inspect it, and make sure it has declarations like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE my.doctype [
<!ENTITY % entity_file SYSTEM "my.entities.ent">
%entity_file;
]>

If it has that, it's great, and I can leave them as is, but if the declarations are missing or wrong, I need remove whatever is already there and add the correct declarations.

How can I do this (preferably easily, using standard Java 6 and/or Apache libraries)?

A: 

This code should get you started in figuring it out. You may have to make a new document to change the content of the doctype if is wrong, I don't know a way to modify an existing one.

private Document copyDocument(Document document) {
    DocumentType origDoctype = document.getDoctype();
    DocumentType doctype = documentBuilder
        .getDOMImplementation().createDocumentType(origDoctype.getName(), 
                                                   origDoctype.getPublicId(),
                                                   origDoctype.getSystemId());
    Document copiedDoc = documentBuilder.getDOMImplementation().
        createDocument(null, origDoctype.getName(), doctype);
    // so we already have the top element, and we have to handle the kids.
    Element newDocElement = copiedDoc.getDocumentElement();
    Element oldDocElement = document.getDocumentElement();
    for (Node n = oldDocElement.getFirstChild(); n != null; n = n.getNextSibling()) {
        Node newNode = copiedDoc.importNode(n, true);
        newDocElement.appendChild(newNode);
    }

    return copiedDoc;
}
bmargulies
A: 

If you have control over how those documents are formed, try to avoid DTD as they introduces unneeded complexity and is underpowed in expressing schema...

vtd-xml-author
A: 

Why do you need to "remove whatever is already there and add the correct declarations"?

If you're using the XML file for input, and not writing it back out in some form, then the appropriate solution is to create an EntityResolver.

A complete description of the process is here, but the following code shows how to give the parser your own DTD, regardless of what the document says it wants:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new EntityResolver()
{
    public InputSource resolveEntity(String publicId, String systemId)
        throws SAXException, IOException
    {
        return new InputSource(new StringReader(dtd));
    }
});
kdgregory