ansaurus

Question

What's a good way to replace international characters with their base Latin counterparts using Python?

Answer 1

+5 A:

It would be better if you created an explicit table, and then used the unicode.translate method. The advantage would be that transliteration is more precise, e.g. transliterating "ö" to "oe" and "ß" to "ss", as should be done in German.

There are several transliteration packages on PyPI: translitcodec, Unidecode, and trans.

Martin v. Löwis 2009-07-28 07:45:57

You're right, transliteration is better than just removing the decorations from characters. If `"große"` became `"grose"`, that would be confusing (not that I think I'll be dealing with German words, but still)...

Blixt 2009-07-28 08:03:05

Is there any way I can get an overview of how these three packages compare? Such as popularity, support, etc? (btw, my German is rusty, I meant `"groß"` and `"gros"` `=`)

Blixt 2009-07-28 08:11:36

I use *translitcodec* since it appears to be the most professionally structured module. It also, unlike *trans*, replaces `å` with `aa` and `ä` with `ae` as is proper when translitering.

Blixt 2009-07-28 11:08:30

@Blixt: sorry, I can't help with a recommendation of a specific package - this might be a separate SO question. However, it appears that you have done an evaluation already.

Martin v. Löwis 2009-07-28 11:27:02

ansaurus

tags:

views:

answers:

What's a good way to replace international characters with their base Latin counterparts using Python?

related questions