tags:

views:

213

answers:

2

if you're looping though the chars a unicode string in python (2.x), say:

ak.sɛp.tɑ̃

How can you tell whether the current char is a combining diacritic mark?

For instance, the last char in the above string is actually a combining mark:

ak.sɛp.tɑ̃ --> ̃

A: 

Use the unicodedata module

vartec
+5  A: 

Use the unicodedata module:

import unicodedata
if unicodedata.combining(u'a'):
    print "is combining character"
else:
    print "is not combining"

these posts are also relevant

http://stackoverflow.com/questions/446222/how-do-i-reverse-unicode-decomposition-using-python

http://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string

Joe Koberg