I'm currently doing this to replace extended-ascii characters with their HTML-entity-number equivalents:
s.encode('ascii', 'xmlcharrefreplace')
What I would like to do is convert to the HTML-entity-name equivalent (i.e. ©
instead of ©
). This small program below shows what I'm trying to do that is failing. Is there a way to do this, aside from doing a find/replace?
#coding=latin-1
def convertEntities(s):
return s.encode('ascii', 'xmlcharrefreplace')
ok = 'ascii: !@#$%^&*()<>'
not_ok = u'extended-ascii: ©®°±¼'
ok_expected = ok
not_ok_expected = u'extended-ascii: ©®°±¼'
ok_2 = convertEntities(ok)
not_ok_2 = convertEntities(not_ok)
if ok_2 == ok_expected:
print 'ascii worked'
else:
print 'ascii failed: "%s"' % ok_2
if not_ok_2 == not_ok_expected:
print 'extended-ascii worked'
else:
print 'extended-ascii failed: "%s"' % not_ok_2