tags:

views:

70

answers:

1

I'm adapting the following code (created via advice in this question), that took an XML file and it's DTD and converted them to a different format. For this problem only the loading section is important:

xmldoc = open(filename)

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)    
tree = etree.parse(xmldoc, parser)

This worked fine, whilst using the file system, but I'm converting it to run via a web framework, where the two files are loaded via a form.

Loading the xml file works fine:

tree = etree.parse(StringIO(data['xml_file']) 

But as the DTD is linked to in the top of the xml file, the following statement fails:

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
tree = etree.parse(StringIO(data['xml_file'], parser)

Via this question, I tried:

etree.DTD(StringIO(data['dtd_file'])
tree = etree.parse(StringIO(data['xml_file'])

Whilst the first line doesn't cause an error, the second falls over on unicode entities the DTD is meant to pick up (and does so in the file system version):

XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46

How do I go about correctly loading this DTD?

A: 

You could probably use a custom resolver. The docs actually give an example of doing this to provide a dtd.

Steven