Is there a way to create a text node using xml.dom.minidom using unicode strings?
Yes, createTextNode always takes Unicode strings. The text model of the XML information set is Unicode, as you can see:
>>> doc= minidom.parseString('<a>b</a>')
>>> doc.documentElement.firstChild.data
u'b'
So:
>>> doc.createTextNode(u'Hell\xF6') # OK
<DOM Text node "u'Hell\xf6'">
Minidom does allow you to put non-Unicode strings in the DOM, but if you do and they contain non-ASCII characters you'll come a cropper later on:
>>> doc.documentElement.appendChild(doc.createTextNode('Hell\xF6')) # Wrong, not Unicode string
<DOM Text node "'Hell\xF6'">
>>> doc.toxml()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/xml/dom/minidom.py", line 45, in toxml
return self.toprettyxml("", "", encoding)
File "/usr/lib/python2.6/xml/dom/minidom.py", line 60, in toprettyxml
return writer.getvalue()
File "/usr/lib/python2.6/StringIO.py", line 270, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
This is assuming that by “encoded in unicode” you mean you are using Unicode strings. If you mean something else, like you've got byte strings in a UTF-8 encoding, you need to convert those byte strings to Unicode strings before you put them in the DOM:
>>> b= 'Hell\xc3\xb6' # Hellö encoded in UTF-8 bytes
>>> u= b.decode('utf-8') # Proper Unicode string Hellö
>>> doc.documentElement.appendChild(doc.createTextNode(u))
>>> doc.toxml()
u'<?xml version="1.0" ?><a>bHell\xf6</a>' # correct!