views:

329

answers:

1

Hi there. Say I have the string "blöt träbåt" which has a few a and o with umlaut and ring above. I want it to become "blot trabat" as simply as possibly. I've done some digging and found the following method:

import unicodedata
unicode_string = unicodedata.normalize('NFKD', unicode(string))

This will give me the string in unicode format with the international characters split into base letter and combining character (\u0308 for umlauts.) Now to get this back to an ASCII string I could do ascii_string = unicode_string.encode('ASCII', 'ignore') and it'll just ignore the combining characters, resulting in the string "blot trabat".

The question here is: is there a better way to do this? It feels like a roundabout way, and I was thinking there might be something I don't know about. I could of course wrap it up in a helper function, but I'd rather check if this doesn't exist in Python already.

+5  A: 

It would be better if you created an explicit table, and then used the unicode.translate method. The advantage would be that transliteration is more precise, e.g. transliterating "ö" to "oe" and "ß" to "ss", as should be done in German.

There are several transliteration packages on PyPI: translitcodec, Unidecode, and trans.

Martin v. Löwis
You're right, transliteration is better than just removing the decorations from characters. If `"große"` became `"grose"`, that would be confusing (not that I think I'll be dealing with German words, but still)...
Blixt
Is there any way I can get an overview of how these three packages compare? Such as popularity, support, etc? (btw, my German is rusty, I meant `"groß"` and `"gros"` `=`)
Blixt
I use *translitcodec* since it appears to be the most professionally structured module. It also, unlike *trans*, replaces `å` with `aa` and `ä` with `ae` as is proper when translitering.
Blixt
@Blixt: sorry, I can't help with a recommendation of a specific package - this might be a separate SO question. However, it appears that you have done an evaluation already.
Martin v. Löwis