tags:

views:

58

answers:

1

In written Arabic, characters look differently depending on where they stand in a word. For example, the letter ta might look like this: ـثـ inside a word but look like this: ﺙ if it stands by itself. I have some Arabic text, for example:

string word = والتفويض ;

When I render word as a whole word it renders correctly. Now, I want to parse the string and print out each letter in the word one at a time. However, if I do this:

foreach(char c in word.ToCharArray())
{
    Debug.Print(c.ToString());  
}

The char c doesn't print out the original representation of the letter as it was rendered in the context of a word, instead it prints out the same Arabic letter as if it were rendered by itself. How can I parse my string of Arabic text so that the letters returned look the same as when they were displayed as a whole word?

I'm trying to do this in c#.

+2  A: 

There are characters in the UCS that represent particular forms of Arabic characters. However, these do not work well when moving from one context to another.

In general if you want to indicate that a letter is joined to another, when there is no such letter to join it to, you should use U+200D ZERO WIDTH JOINER at the appropriate place (before the character to place the joiner to the right, after the character to place it to the left, or having one on either side.

Conversely, placing U+200C ZERO WIDTH NON-JOINER between characters will break their joining.

Just how well that works in practice will depend on the rendering engine processing the characters.

Jon Hanna