views:

1162

answers:

7

I need some information about localization. I am using .net 2.0 with C# 2.0 which takes care of most of the localization related issues. However, I need to manually draw the alphabets corresponding to the current culture on the screen in one particular screen.

This would be similar to the Contacts screen in Microsoft Outlook (Address Cards view or Detailed Address Cards View under Contacts), and so it needs a the column of buttons at the right end, one for each alphabet.

I am trying to emulate that, but I don't want to ask the user to choose the script. If the current culture is say, Chinese, I want to draw Chinese alphabets. When the user changes the culture info to English (and when he restarts the application) I want to draw English alphabets instead. Hope you understand where I am going with this query.

I can determine the culture of the current user (Application.CurrentCulture or System.Globalization.CultureInfo.CurrentCulture will give the culture related information). I also have all the scripts to render the alphabets. However, the problem is that I don't know how to map the culture info to the name of a script.

In other words, is there a way to determine the script name corresponding to a culture? Or is it possible to determine the range of Unicode character values corresponding to a culture? Either of them would allow me to render the alphabets on the button properly.

Any suggestions or guidance regarding this is truly appreciated. If there is something fundamentally wrong with my approach (or with what I am trying to achieve), please point out that as well. Thanks for your time.

PS: I know the easiest solution is to either configure the script name as part of user preferences or display a list of languages for the user to choose from (a la Contact in Outlook 2007). But I am just trying to see whether I can render the alphabets corresponding to the culture without the user having to do anything.

A: 

Chinese has thousands of characters, so it might not be feasible to show all the characters in their character set. There's no native concept of 'alphabet' in Chinese, and I don't think Chinese has a syllabary like Japanese does.

Pinyin (Chinese written in roman alphabet) can be used to represent the Chinese characters, and that might help you index them. I know this doesn't answer your question, but I hope it's helpful.

Mike Sickler
A: 

I fully agree with mikiemacman. In addition, a given laguage doesn't necessarily uses all the letters of a script.

Anyway, the closest I can think of is CultureInfo.TextInfo.ANSICodePage -> There are only a handful of ANSI code pages. You could have create a table (or a switch() statement, whatever) that lists the script for each ANSI codepage.

Serge - appTranslator
A: 

Mike and Serge, thanks a lot for your reply.

I checked the CultureInfo and ANSI Code pages. However, let us say, if at runtime, I get the culture info as

cultureInfo.EnglishName = Tamil; ANSICodePage = 0

What am I supposed to do next? I mean my problem is I need to map the name Tamil to its corresponding script name (in this case that too is called "Tamil"; there might not be a one to one correspondence between the culture Names and the Unicode script names as the count of these two doesn't match).

The script info for Tamil is available here: http://unicode.org/charts/PDF/U0B80.pdf

From that I understand that the Unicode range for Tamil would be 0B80-0BFF.

To rephrase my question, when I get a random name like "Tamil" at run time based on the CultureInfo, how can I arrive at the script name "Tamil" or the range "0B80-0BFF"?

Or, are you saying that there is no way to do this mapping other than sitting down and finding out the values for mapping and add code to do a table lookup (or whatever) to arrive at the required Unicode range? Please note that (from what I know or how I understand stuff) I need the script name or the Unicode range to pick out the alphabets (assuming I can pick out the alphabets; should be possible for most of the languages) and render them on the buttons.

Any ideas?

Proto
Right. Tamil is Unicode only hence doesn't have any corresponding ANSI codepage. See my new answer below
Serge - appTranslator
A: 

In native code there's LOCALE_SSCRIPTS for GetLocaleInfoEx() (Vista & above) that shows you what scripts are expected for a locale. There isn't a similar concept for .Net at this time.

Please check my reply below (dated 14-Nov-08)
Proto
A: 

Proto, wait! There's a much more accurate solution. It's an unmanaged on hance you may have to P/Invoke.

GetLocaleInfoW(MAKELCID(wLangId, SORT_DEFAULT), LOCALE_FONTSIGNATURE, wcBuf, MAXWCBUF);

This gives you a LOCALESIGNATURE stucture. The anwer is in the lsUsb field: Unicode subsets bitfield. Rats! the MS page for this structure is empty. But look it up in your MSDN copy. It's fully documented there: A whole set of flags that describe which scripts are spported. And yes, there's a flag for Tamil ;-)

HTH.

EDIT: Oops! Hadn't seen Shawne's answer. Wow! Answer from an in-house expert! ;-) Anyway, you may still be interested in a Pre-Vista compatible answer.

Serge - appTranslator
Please check my reply below
Proto
A: 

Hello Serge, thanks for the reply. Regret the fact that I am replying after 10 days. Had been busy with other stuff until this morning. I had been reading up further about the tip you gave and came across this blog post from Michael Kaplan (International Fundamentals team, Microsoft):

"...Unfortunately, it does not work in practice, for many reasons: For years the docs were wrong in their description of the locale side of this functionality...The data quality in the LOCALESIGNATURE has not always been accurate across all locales and all versions of Windows..."

So, it looks like even the latest tip might not be a sufficient solution (or a solution that is always correct).


Incidentally I came across what could be an issue with the solution suggested by Shawne. Looks like the script name returned by GetLocaleInfoEx() is correct, but it might not be sufficient to arrive at the list of alphabets for a language (probably it is not supposed to be). Let me explain. If you take the case of Finnish, GetLocaleInfoEx() returns the script name as Latin. I would assume that that would map to the normal 26 alphabets (I don't know Finnish and so please bear with my ignorance when I say "I assume" :) that I am aware of.

However, Wikipedia tells me that the Finnish alphabet doesn't have a W and instead has 3 extra characters (over English) after the Z (similar to the Swedish alphabet) and so this means that displaying the characters A-Z might not suffice (again, I don't know the language but it does look like it won't suffice from what Wikipedia says; I might be wrong here).

It would be great if someone can throw some light about the next logical step: if I have a script name (Latin) and a locale name (say Finnish), is it possible to reliably and programmatically arrive at the character set corresponding to that combination (or the exact unicode characters)? I know (from what I read in this thread) this might be a stupid question for languages like Chineese, but I think it is not such a stupid question for say the European languages. Thanks.

Proto
A: 

Fascinating topic. While it might not answer your question, Omniglot is a good resource.

The correct answer is likely to be complex, and depend on the exact problem you're solving. Assuming your goal showing only letters used in a particular language to separate phonebook sections (as in Outlook), few of the issues are:

  • People who have contact names spanning several scripts/languages.
  • 2-glyph letters (e.g. 'Lj' in Serbian). It is one phoneme, always treated as a single letter although it has 2 Unicode symbols. 'It would have its own section in the phonebook (separate from 'L').
  • Too many glyphs to list (e.g. Chinese)
  • Unorthodox ordering (e.g. Thai -- a phone book would be separated by consonants only, ignoring the vowels).
  • Uppercase / lowercase distinction (presumably you'd only want one case for languages that support it -- which breaks down in minor ways Turkish 'i').
dbkk
Thanks for the comments and providing some examples to make me think harder about the problem, dbkk. Yeah, I might have to spend more time on this than I initially anticipated.
Proto