In my VB.NET application I compare words that are recorded using IPA, many of which have many diacritic marks. In one of the comparisons, I compare the words character by character. But when I iterate over the characters, the diacritic marks come out as separate characters (as I would expect since this is unicode):
o`ku`ku`
However, a u character is different than a u plus an accent for the purposes of this program and needs to be distinguished.
Is there a good way to iterate over unicode strings in such a way that characters and their accents are considered one character? I'm trying to avoid having to hardcode all the combinations that combine to be considered a single character.
Edit:
The Normalize() method does work for characters with simple diacritic marks that have a single-character unicode representation, such as most accented vowels. However, this does not work for more obscure symbols, like uˤ
and uˠ
.