views:

37

answers:

1

I have a few different XML documents that I'm trying to combine into one using lxml. The problem is that I need the result to preserve the namespaces on each of the sub-documents' root nodes. Lxml seems to want to push any namespace declarations used more than once to the root of the new document, which breaks in my application (it is an acknowledged bug).

So for example, I have document A:

<dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/"&gt;
   <title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</title>
</dc>

and document B:

<mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd"&gt;
<titleInfo>
    <nonSort>La</nonSort>
        <title>difesa della razza</title>
        <subTitle>scienza, documentazione, polemica</subTitle>
        <partNumber>anno 1:n. 1</partNumber>
</titleInfo>
</mods>

I want to wrap them in a element that also uses an xsi:schemaLocation, but I need the namespace declaration (xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance") to appear in all three nodes, like this:

<wrap xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org" xmlns:dc="http://www.foo.org" xmlns:mods="http://www.bar.org"&gt;

    <dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/"&gt;
       <dc:title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</dc:title>
    </dc:dc>

    <mods:mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd"&gt;
    <mods:titleInfo>
        <mods:nonSort>La</mods:nonSort>
            <mods:title>difesa della razza</mods:title>
            <mods:subTitle>scienza, documentazione, polemica</mods:subTitle>
            <mods:partNumber>anno 1:n. 1</mods:partNumber>
    </mods:titleInfo>
    </mods:mods>
</wrap>

However, when I append these two documents using Python/lxml

wrap.append(dc)
wrap.append(mods)

I get the declaration pushed up to the highest level node that uses it. Unfortunately, this is a problem for my application. Like this:

<wrap xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org" xmlns:dc="http://www.foo.org" xmlns:mods="http://www.bar.org"&gt;

    <dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/"&gt;
       <dc:title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</dc:title>
    </dc:dc>

    <mods:mods xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd"&gt;
    <mods:titleInfo>
        <mods:nonSort>La</mods:nonSort>
            <mods:title>difesa della razza</mods:title>
            <mods:subTitle>scienza, documentazione, polemica</mods:subTitle>
            <mods:partNumber>anno 1:n. 1</mods:partNumber>
    </mods:titleInfo>
    </mods:mods>
</wrap>

Any ideas how I can force the behavior I want?

THanks

A: 

You could try inserting XInclude elements first, and then resolving them with the .xinclude() method (see docs). That seems to preserve the namespace declarations (lxml keeps them when they originate from the parser, but not when you create elements yourself, or move elements from one document to another)

Note that in your case, you would still need to change the tag name of the elements: they will be included as they are in the original documents, without any namespace, while you seem to have changed them to namespaced element names in your output.

You might have to use a custom resolver, contrary to what the docs might seem to say about .xinclude() not supporting this (it does use resolvers from the parser used to parse the containing document, it just doesn't support passing a specific resolver or parser to the XInclude processing).

The other option would probably be an xslt-based solution.

Steven