views:

248

answers:

4

How can we find if a character belongs to a particular codepage? or How can we determine whether a charcter fits into currently active IME for an application.

A: 

Using the windows functions WideCharToMultiByte and MultiByteToWideChar you can convert between UTF-8 and 16-bit Unicode characters. The functions have arguments to specify the code page and to specify the behavior if an invalid character is encountered.

Patrick
Thanks , Yes you are right , i was using LPBOOL lpUsedDefaultChar parameter of the WideCharToMultiByte() to determine the same , however for Hindi IME that has code page 0 , the result lpUsedDefaultChar is always true.[Not sure how my previous comment got removed :( but I had mentioned it in details there]
Prakash
+1  A: 

Use the WC_ERR_INVALID_CHARS flag and WideCharToMultiByte will fail outright if any invalid characters are used. If you want to know which characters are not represented in the target codepage, use the lpDefaultChar, and lpUsedDefaultChar parameters.

LPCWSTR pszUtf16; // converted from utf8 source character
UINT nTargetCP = CP_ACP;
BOOL fBadCharacter = FALSE;
if(WideCharToMultiByte(nTargetCP,WC_NO_BEST_FIT_CHARS,pszUtf16,NULL,0,NULL,&fBadCharacter)
{
  if(fBadCharacter)
  {
    // at least one character in the string was not represented in nTargetCP
  } 
}
Chris Becke
Thanks Chris .. I posted my comment as answer so that I could add some code..
Prakash
A: 

Thanks Chris..I am running the following code

#define CP_HINDI 0 
#define CP_JAPANESE 932
#define CP_ENGLISH 1252

wchar_t wcsStringJapanese = 'あ';
wchar_t wcsStringHindi = 'र';
wchar_t wcsStringEnglish = 'A';

int main()  
{ 

    BOOL usedDefaultCharacter = FALSE;

    /* Test for ENGLISH */
    WideCharToMultiByte( CP_ENGLISH,
                        0, &wcsStringEnglish,
                        -1,  
                        NULL,
                        0, 
                        NULL, 
                        &usedDefaultCharacter); 
    printf("usedDefaultCharacters for English? %d \n",usedDefaultCharacter);

    usedDefaultCharacter = FALSE;

    /*TEST FOR JAPANESE */

     WideCharToMultiByte( CP_JAPANESE,
                         0,
                         &wcsStringJapanese,
                        -1,  
                        NULL,
                        0, 
                        NULL, 
                        &usedDefaultCharacter); 
    printf("usedDefaultCharacters for Japanese? %d \n",usedDefaultCharacter);

    //TEST FOR HINDI 
    usedDefaultCharacter = FALSE;

    WideCharToMultiByte( CP_HINDI,
                        0, 
                        &wcsStringHindi,
                        -1,  
                        NULL,
                        0, 
                        NULL, 
                        &usedDefaultCharacter); 
    printf("usedDefaultCharacters for Hindi? %d \n",usedDefaultCharacter);   

}

The above code returns:

usedDefaultCharacters for English? 0

usedDefaultCharacters for Japanese? 0

usedDefaultCharacters for Hindi? 1

The third line is incorrect as the Codepage for Hindi is 0 , and the string passed consists of Hindi Character and still the usedDefaultChar is set to 1 .. which should not be the case.

Prakash
The codepage for hindi is NOT zero. Hindi is one of the new 'unicode only' localizations. There is no actual windows ansi codepage for representing hindi characters. Refer to this page: http://msdn.microsoft.com/en-us/goglobal/bb688174.aspx
Chris Becke
so is there any value that I can give for the "codepage" parameter of WideCharToMultiByte to find out if the current encoding supports the Hindi Character?Or is there a way (in c++) to find out what if the current encoding of the page is UNICODE?-Thanks
Prakash
+1  A: 

The two previous answers have correctly suggested using MultiByteToWideChar then WideCharToMultiByte to translate your UTF-8 character to UTF-16, then to the current Windows codepage (CP_ACP). Check the result of WideCharToMultiByte to see if the conversion was successful.

What wasn't clear from the original question, is that you are having a particular issue with Hindi. For this language, your question is meaningless because there is no Windows ANSI codepage for Hindi, as Chris Becke pointed out. Therefore, you can never convert a Hindi character to CP_ACP, and WideCharToMultiByte will always fail.

To use Hindi on Windows, as far as I understand it, you must be a Unicode app that calls Unicode APIs.

richb