How can I read an XML file using Python ElementTree, if the XML has multiple top-level items?
I have an XML file that I would like to read using Python ElementTree.
Unfortunately, it has multiple top-level tags. I would wrap <doc>...</doc>
around the XML, except I have to put the <doc>
after the <?xml>
and <!DOCTYPE>
fields. But figuring out where <!DOCTYPE>
ends is non-trivial.
What I have:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FOO BAR "foo.dtd" [
<!ENTITY ...>
<!ENTITY ...>
<!ENTITY ...>
]>
<ARTICLE> ... </ARTICLE>
<ARTICLE> ... </ARTICLE>
<ARTICLE> ... </ARTICLE>
<ARTICLE> ... </ARTICLE>
What I want:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE FOO BAR "foo.dtd" [
<!ENTITY ...>
<!ENTITY ...>
<!ENTITY ...>
]>
<DOC>
<ARTICLE> ... </ARTICLE>
<ARTICLE> ... </ARTICLE>
<ARTICLE> ... </ARTICLE>
<ARTICLE> ... </ARTICLE>
</DOC>
NB the name of tag ARTICLE might change, so I cannot grep for it.
Can anyone suggest to me how I can add the enclosing <doc>...</doc>
after the XML header, or suggest another workaround?