ansaurus

Question

Answer 1

+1 A:

I think you need to use a Unicode string with \u03c6 in it, because the .data field of a text node is supposed (as far as I understand) to be "parsed" data, not including XML entities (whence the & when made back into XML). If you want to ensure that, on output, non-ascii characters are expressed as entities, you could do:

import codecs

def ent_replace(exc):
  if isinstance(exc, (UnicodeEncodeError, UnicodeTranslateError)):
    s = []
    for c in exc.object[exc.start:exc.end]:
      s.append(u'&#x%4.4x;' % ord(c))
    return (''.join(s), exc.end)
  else:
    raise TypeError("can't handle %s" % exc.__name__)

codecs.register_error('ent_replace', ent_replace)

and use x.toxml().encode('ascii', 'ent_replace').

Alex Martelli 2009-05-31 23:17:08

ansaurus

tags:

views:

answers:

XML characters in python xml.dom

related questions