views:

216

answers:

3

In some RightToLeft languages (Like Arabic, Persian, Urdu, etc) each letter can have different shapes. There is isolated form, initial form, and middle form (you can just find it on the Character Map of the windows for any unicode font).

Imagine you need the exact characters that user has been entered on a text box, by default, when you converting the String to CharArray, it will convert each character to Isolated form.

(because when user entering the characters by keyboard, it is in the isolated form and when it is displaying on the screen, it will be converted to proper format; this is just a guess. because if you make the string by using exact character codes, it will generate the proper array).

My question is, how we can get that form of the string, the form that has been displayed in the textbox.

If there is no way in .NET then this means i need to make my own class to convert this T_T

A: 

So how are you creating the "wrong" string? If you're just putting it in a string literal, then it's quite possible it's just the input method that's wrong. If you copy the "right" string after displaying it, and then paste that into a string literal, what happens? You might also want to check which encoding Visual Studio is using for your source files. If you're not putting the string into your source code as a literal, how are you creating it?

Given the possibility for confusing, I think I'd want to either keep these strings in a resource, or hard code them using unicode escaping:

string text = "\ufb64\ufea0\ufe91\feea";

(Then possibly put a comment afterwards showing the non-escaped value; at least then if it looks about right, it won't be too misleading. Admittedly it's then easy for the two to get out of sync...)

Jon Skeet
The input string comes from the user input and it is not static. It is for example title of a page or menu. So it cannot be hard coded. you can event try by using TextBox control and you will get same result.
Mostafa
Right, in that case it's a limitation of the input method. You *may* find that changing the font of the TextBox helps... I'm not sure. I'll see whether I've got enough fonts etc installed to check it.
Jon Skeet
I think this happens because when you entering the text by using keyboard, it will enter the default character, which is the isolated form, but on the text box windows will convert it to proper form on the display.
Mostafa
I don't know... I was copying and pasting your string directly into a textbox, and it still gave the "wrong" string. Hmm... tricky.
Jon Skeet
But if you go to the Character Map and then use the exact characters, then copy it and past it to the text box, it will return the exact values.
Mostafa
A: 

This is a bit of a wild guess, but does String.Normalize() help here? It is unclear to me whether that just covers character composition or if it includes positional forms as well.

DocMax
Actually i have tried that one also, but no result T_T
Mostafa
+2  A: 

Windows uses Uniscribe to perform contextual shaping for complex scripts (which can apply to l-to-r as well as r-to-l languages). The displayed text in a text box is based on the glyph info after the characters have been fed into Uniscribe. Although the Unicode standard defines code points for each of isolated, initial, medial, and final forms of a chracter, not all fonts necessarily support them yet they may have pre-shaped glyphs or use a combination of glyphs—Uniscribe uses a shaping engine from the Windows language pack to determine which glyph(s) to use, based on the font's cmap. Here are some relevant links:

The TextRenderer.DrawText() method uses Uniscribe via the Win32 DrawTextExW() function, using the following P/Invoke:

[DllImport("user32.dll", CharSet=CharSet.Unicode, SetLastError=true)]
public static extern int DrawTextExW( HandleRef hDC
                                     ,string lpszString
                                     ,int nCount
                                     ,ref RECT lpRect
                                     ,int nFormat
                                     ,[In, Out] DRAWTEXTPARAMS lpDTParams);

[StructLayout(LayoutKind.Sequential)]
public struct RECT
 {
   public int left;
   public int top;
   public int right;
   public int bottom;
 }

[StructLayout(LayoutKind.Sequential)]
public class DRAWTEXTPARAMS
{
  public int iTabLength;
  public int iLeftMargin;
  public int iRightMargin;
  public int uiLengthDrawn;
}
Mark Cidade
Thanks for your answer. But my question is how i can convert entered text, to the shaped text and get the result as char array or string.
Mostafa
I added more information about Uniscribe and why it's not trivial to get the characters (code points) that are shown in the text box. It seems that your only options are use Uniscribe by looking up indexes in font cmaps, or roll your own shaping information engine.
Mark Cidade