views:

69

answers:

2

I'm looking for a library to help me parse and transform DTDs using Python. The only thing I have found so far is xmlproc, but that seems ancient and doesn't seem to support serialization of DTDs. There's this for Java but I'd prefer a Python solution.

Edit: by "serialization" of DTDs I mean that ideally I'd like to be able to parse the DTD to some kind of Python structure, operate on that structure and then write out the result back to a DTD.

A: 

I don't know of an end-to-end processor for DTDs, but then again I so rarely use DTDs at all so that's not surprising.

Amara can parse DTDs, but I don't know what level of access you can have to them or if the results can be serialized. I assume they can, but that's not based in reality. libxml2, which is available in Python as lxml is something else to investigate, but I have even less experience with that. It seems from the libxml documentation that you would have access to the full DTD.

Another possibility is to convert the DTD to XSD with one of many programs then use a regular XML processor to manipulate the tree, and return it back to DTD. I worry about how lossy that might be.

At an increasing level of difficulty, if you're going to write a parser yourself for the DTD grammar, consider PyParsing or PLY.

Andrew Dalke
A: 

You might want to consider converting your DTD to one of the XML-based formats. At that point, you can process it with ElementTree, or whatever XML toolkit you prefer.

I've had good experience with RelaxNG, which is fairly concise and straightforward. There's a list of conversion tools on its site: http://relaxng.org/#conversion

If you prefer XML Schema, here's what is available: http://www.w3.org/XML/Schema

If you're dealing with third-party documents or DTDs, this may not work for you. If it's in-house, give it a shot. XML-based schemas are much more pleasant to work with.

ieure