views:

116

answers:

0

Hello,

I would like to examine a XSD schema in python. Currently I'm using lxml which is doing it's job very very well when it only has to validate a document against the schema. But, I want to know whats inside of the schema and access the elements in the lxml behavior.

The schema:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"&gt;
    <xsd:include schemaLocation="worker_remote_base.xsd"/>
    <xsd:include schemaLocation="transactions_worker_responses.xsd"/>
    <xsd:include schemaLocation="transactions_worker_requests.xsd"/>
</xsd:schema>

The lxml code to load the schema is (simplyfied):

xsd_file_handle = open( self._xsd_file, 'rb')
xsd_text        = xsd_file_handle.read()
schema_document   = etree.fromstring(xsd_text, base_url=xmlpath)
xmlschema         = etree.XMLSchema(schema_document)

I'm then able to use schema_document (which is etree._Element) to go through the schema as an XML document. But since etree.fromstring (at least it seams like that) expects a XML document the xsd:include elements are not processed.

The problem is currently solved by parsing the first schema document, then load the include elements and then insert them one by one into the main document by hand:

BASE_URL            = "/xml/"
schema_document     = etree.fromstring(xsd_text, base_url=BASE_URL)
tree                = schema_document.getroottree()

schemas             = []
for schemaChild in schema_document.iterchildren():
    if schemaChild.tag.endswith("include"):
        try:
            h = open (os.path.join(BASE_URL, schemaChild.get("schemaLocation")), "r")
            s = etree.fromstring(h.read(), base_url=BASE_URL)
            schemas.append(s)
        except Exception as ex:
            print "failed to load schema: %s" % ex
        finally:
            h.close()
        # remove the <xsd:include ...> element
        self._schema_document.remove(schemaChild)

for s in schemas:
# inside <schema>
    for sChild in s:
        schema_document.append(sChild)

What I'm asking for is an idea how to solve the problem by using a more common way. I've already searched for other schema parsers in python but for now there was nothing that would fit in that case.

Greetings, Michael