minidom, again.
tried
document.doctype = xml.dom.minidom.DocumentType('html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"')
There is no doctype in the output. How to fix without inserting it by hand?
minidom, again.
tried
document.doctype = xml.dom.minidom.DocumentType('html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"')
There is no doctype in the output. How to fix without inserting it by hand?
You shouldn't instantiate classes from minidom
directly. It's not a supported part of the API, the ownerDocument
s won't tie up and you can get some strange misbehaviours. Instead use the proper DOM Level 2 Core methods:
>>> imp= minidom.getDOMImplementation('')
>>> dt= imp.createDocumentType('html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd')
(‘DTD/xhtml1-strict.dtd’ is a commonly-used but wrong SystemId
. That relative URL would only be valid inside the xhtml1 folder at w3.org.)
Now you've got a DocumentType
node, you can add it to a document. According to the standard, the only guaranteed way of doing this is at document creation time:
>>> doc= imp.createDocument('http://www.w3.org/1999/xhtml', 'html', dt)
>>> print doc.toxml()
<?xml version="1.0" ?><!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'><html/>
If you want to change the doctype of an existing document, that's more trouble. The DOM standard doesn't require that DocumentType
nodes with no ownerDocument
be insertable into a document. However some DOMs allow it, eg. pxdom
. minidom
kind of allows it:
>>> doc= minidom.parseString('<html xmlns="http://www.w3.org/1999/xhtml"><head/><body/></html>')
>>> dt= minidom.getDOMImplementation('').createDocumentType('html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd')
>>> doc.insertBefore(dt, doc.documentElement)
<xml.dom.minidom.DocumentType instance>
>>> print doc.toxml()
<?xml version="1.0" ?><!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'><html xmlns="http://www.w3.org/1999/xhtml"><head/><body/></html>
but with bugs:
>>> doc.doctype
# None
>>> dt.ownerDocument
# None
which may or may not matter to you.
Technically, the only reliable way per the standard to set a doctype on an existing document is to create a new document and import the whole of the old document into it!
def setDoctype(document, doctype):
imp= document.implementation
newdocument= imp.createDocument(doctype.namespaceURI, doctype.name, doctype)
newdocument.xmlVersion= document.xmlVersion
refel= newdocument.documentElement
for child in document.childNodes:
if child.nodeType==child.ELEMENT_NODE:
newdocument.replaceChild(
newdocument.importNode(child, True), newdocument.documentElement
)
refel= None
elif child.nodeType!=child.DOCUMENT_TYPE_NODE:
newdocument.insertBefore(newdocument.importNode(child, True), refel)
return newdocument