I'm looking to standardize some unicode text in python. I'm wondering if there's an easy way to get the "denormalized" form of a combining unicode character in python? e.g. if I have the sequence u'o\xaf' (i.e. latin small letter o
followed by combining macron
), to get ō (latin small letter o with macron
). It's easy to go the other way:
o = unicodedata.lookup("LATIN SMALL LETTER O WITH MACRON")
o = unicodedata.normalize('NFD', o)