How can I determine what the alphabet for a locale is in java ?

views:

278

answers:

+2 Q:

How can I determine what the alphabet for a locale is in java ?

I would like to determine what the alphabet for a given locale is, preferably based on the browser Accept-Language header values. Anyone know how to do this, using a library if necessary ?

+1 A:

If you just want to know the name of an appropriate character set for a users locale then you might try the nio.CharSet class.

If you really want to use the Accept-Language header, then there's an old O'Reilly article on this matter which introduces a pretty handy class called LanguageNegotiator.

I think one of those will give you a decent enough start.

GaryF 2009-01-06 21:04:51

+1 A:

It depends on how specific you want to get. One place to look would be at the "Suppress-Script" properties in the IANA language registry.

Some languages have multiple "alphabets" that can be used for writing. For example, Azerbaijani can be written in Latin or Arabic script. Most languages, like English, are written almost exclusively in a single script, so the correct script goes without saying, and should be "suppressed" in language codes.

So, looking at the entry for Russian, you can tell that the preferred script is Cyrillic, while for Ethiopian, it is Amharic. But German, Norwegian, and English aren't more specific than "Latin". So, with this method, you'd have a hard time hiding umlauts and thorns from Americans, or offering any script to a Kashmiri writer.

erickson 2009-01-07 01:04:42

+1 A:

This is an English answer written in Århus. Yesterday, I heard some Germans say 'Blödheit, à propos, ist dumm'. However, one of them wore a shirt that said 'I know the difference between 文字 and الْعَرَبيّة'.

What's the answer to your question for this text? Is it allowed? Isn't this an English text?

phihag 2009-01-07 14:09:26

I don't care about the foreign words. However intellectual you want to try to make it, this question has *one* answer per language. Even my mother knows that. It's that one I'm looking for.

krosenvold 2009-01-07 15:29:38

Résumé is a valid English word. So are foreign names.

phihag 2009-01-21 04:10:17

The International Components for Unicode might help here. Specifically the UScript class looks promising.

Out of curiosity: What do you need it for?

Joachim Sauer 2009-01-07 14:14:19

I'm writing a small application that teaches my oldest daughter upper/lowercase letters. I figured I'd make an applet or even a funky javafx application for it. But I really am so fed up with childrens software that's not properly localized.

krosenvold 2009-01-07 15:31:33

+1 A:

take a look at [LocaleData.getExemplarSet][1]

for example for english this returns abcdefghijklmnopqrstuvwxyz

[1]: http://icu-project.org/apiref/icu4j/com/ibm/icu/util/LocaleData.html#getExemplarSet(com.ibm.icu.util.ULocale, int)

Robert Muir 2010-08-19 12:22:18

ansaurus

tags:

views:

answers:

How can I determine what the alphabet for a locale is in java ?

related questions