views:

466

answers:

1

minidom, again.

tried

document.doctype = xml.dom.minidom.DocumentType('html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"')

There is no doctype in the output. How to fix without inserting it by hand?

+2  A: 

You shouldn't instantiate classes from minidom directly. It's not a supported part of the API, the ownerDocument​s won't tie up and you can get some strange misbehaviours. Instead use the proper DOM Level 2 Core methods:

>>> imp= minidom.getDOMImplementation('')
>>> dt= imp.createDocumentType('html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd')

(‘DTD/xhtml1-strict.dtd’ is a commonly-used but wrong SystemId. That relative URL would only be valid inside the xhtml1 folder at w3.org.)

Now you've got a DocumentType node, you can add it to a document. According to the standard, the only guaranteed way of doing this is at document creation time:

>>> doc= imp.createDocument('http://www.w3.org/1999/xhtml', 'html', dt)
>>> print doc.toxml()
<?xml version="1.0" ?><!DOCTYPE html  PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN'  'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'&gt;&lt;html/&gt;

If you want to change the doctype of an existing document, that's more trouble. The DOM standard doesn't require that DocumentType nodes with no ownerDocument be insertable into a document. However some DOMs allow it, eg. pxdom. minidom kind of allows it:

>>> doc= minidom.parseString('<html xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;head/&gt;&lt;body/&gt;&lt;/html&gt;')
>>> dt= minidom.getDOMImplementation('').createDocumentType('html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd')
>>> doc.insertBefore(dt, doc.documentElement)
<xml.dom.minidom.DocumentType instance>
>>> print doc.toxml()
<?xml version="1.0" ?><!DOCTYPE html  PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN'  'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'&gt;&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;head/&gt;&lt;body/&gt;&lt;/html&gt;

but with bugs:

>>> doc.doctype
# None
>>> dt.ownerDocument
# None

which may or may not matter to you.

Technically, the only reliable way per the standard to set a doctype on an existing document is to create a new document and import the whole of the old document into it!

def setDoctype(document, doctype):
    imp= document.implementation
    newdocument= imp.createDocument(doctype.namespaceURI, doctype.name, doctype)
    newdocument.xmlVersion= document.xmlVersion
    refel= newdocument.documentElement
    for child in document.childNodes:
        if child.nodeType==child.ELEMENT_NODE:
            newdocument.replaceChild(
                newdocument.importNode(child, True), newdocument.documentElement
            )
            refel= None
        elif child.nodeType!=child.DOCUMENT_TYPE_NODE:
            newdocument.insertBefore(newdocument.importNode(child, True), refel)
    return newdocument
bobince
thanks! I don't need to set doctype on existing document, just on new.
myfreeweb
Phew, that's lucky! :-)
bobince