views:

278

answers:

5

I would like to determine what the alphabet for a given locale is, preferably based on the browser Accept-Language header values. Anyone know how to do this, using a library if necessary ?

+1  A: 

If you just want to know the name of an appropriate character set for a users locale then you might try the nio.CharSet class.

If you really want to use the Accept-Language header, then there's an old O'Reilly article on this matter which introduces a pretty handy class called LanguageNegotiator.

I think one of those will give you a decent enough start.

GaryF
+1  A: 

It depends on how specific you want to get. One place to look would be at the "Suppress-Script" properties in the IANA language registry.

Some languages have multiple "alphabets" that can be used for writing. For example, Azerbaijani can be written in Latin or Arabic script. Most languages, like English, are written almost exclusively in a single script, so the correct script goes without saying, and should be "suppressed" in language codes.

So, looking at the entry for Russian, you can tell that the preferred script is Cyrillic, while for Ethiopian, it is Amharic. But German, Norwegian, and English aren't more specific than "Latin". So, with this method, you'd have a hard time hiding umlauts and thorns from Americans, or offering any script to a Kashmiri writer.

erickson
+1  A: 

This is an English answer written in Århus. Yesterday, I heard some Germans say 'Blödheit, à propos, ist dumm'. However, one of them wore a shirt that said 'I know the difference between 文字 and الْعَرَبيّة'.

What's the answer to your question for this text? Is it allowed? Isn't this an English text?

phihag
I don't care about the foreign words. However intellectual you want to try to make it, this question has *one* answer per language. Even my mother knows that. It's that one I'm looking for.
krosenvold
Résumé is a valid English word. So are foreign names.
phihag
A: 

The International Components for Unicode might help here. Specifically the UScript class looks promising.

Out of curiosity: What do you need it for?

Joachim Sauer
I'm writing a small application that teaches my oldest daughter upper/lowercase letters. I figured I'd make an applet or even a funky javafx application for it. But I really am so fed up with childrens software that's not properly localized.
krosenvold
+1  A: 

take a look at [LocaleData.getExemplarSet][1]

for example for english this returns abcdefghijklmnopqrstuvwxyz

[1]: http://icu-project.org/apiref/icu4j/com/ibm/icu/util/LocaleData.html#getExemplarSet(com.ibm.icu.util.ULocale, int)

Robert Muir