views:

623

answers:

5

Is there a way to get the letters of the alphabet in a language?

I want to do paging, and I want to show for example the last 7 letters of the alphabet. For the dutch alphabet t-z are the last 7 letters, but for Sweden it's w-ö (which is w x y z å ä ö).

And when I get that as input, for the "normal" a-z alphabet I can generate the letters in between by using the ASCII table, but for Sweden that won't work.

+1  A: 

I don't think it's accessible programmatically by default but here's a good set of reference documents at the Evertype website

Lazarus
+1 as while I'd recommend the CLDR where applicable, and the CLDR covers a larger number of languages, Michael's set is good in having rules about semi-included letters in alphabets (e.g. in English we don't normally include æ in the alphabet, at least as far as Modern English goes, but it is properly sorted as if it were "ae", while ȝ is also no longer included in the alphabet but sorts between y and z). Likewise with letters that are commonly used in loan words in some languages and those that have changed alphabets. Not often necessary to know, but a great source when it is.
Jon Hanna
+3  A: 

To the best of my knowledge, neither .NET nor Windows provide this information. However, you can find it in the Unicode Consortium's CLDR database. This DB is actually a set of XML files (one for each language. named after the language abbreviation) containing all sorts of localisation info. A gold mine!

the element /ldml/characters/examplarCharacters contains a list of characters used in the language. e.g., for Swedish (sv.xml):

[a-v x-z å ä ö]

Note that when you say 'ASCII letters', you do realize you're limiting yourself to the Latin script, don't you? As far as ar the CLDR is concerned, lists such as a-z are Unicode characters sequences, not just ASCII letters. eg, in Russian (from ru.xml):

[а-е ё ж-я]
Serge - appTranslator
Careful! The CLDR does not really contains the required information. For example, for Dutch the element /ldml/characters/exemplarCharacters is [a á ä b-e é ë f-i í ï {ij} j-o ó ö p-u ú ü v-z] (yikes!) But trust me, [a-z] is what you need for this question.
Ruben
No, I don't trust you ;-) Even though accented letters are not in common use in the language itself, letters such as é are not rare in Dutch names (not sure for the Netherlands. But sure for Flemishes (Dutch speaking belgians. nl-BE). And according to www.voornamen.com, Björn is currently the most popular Dutch firstname for babies.
Serge - appTranslator
True, but that's irrelevant to the question. Accented letters are used, but they are never regarded as separate letters for alphabetizing, unlike å ä and ö in Swedish. Filtering on [a] in Dutch *must* include á à ä etc. (Just check a dictionary.)
Ruben
The exemplar characters in CLDR, type="index" are what you are looking for. http://unicode.org/reports/tr35/#Character_Elements
Steven R. Loomis
A: 

So to clean this one up, i think the answer is that even if i limit myself to western languages i can't ask the .Net framework for the alphabet letters. So i've made a list myself with the letters, luckily there were only four languages to do.

Michel
+1  A: 

You're going to send a list of strings to your translators anyway. For each language of your site, you'll have one, and they'll each know the answer for their languages. So just submit the string "a b c d e f g h i j k l m n o p q r s t u v w x y z" to them, and document this as the alplabet used for paging. They should be able to translate it for you. Mind you, you could get back entries like ..."x ij z" for dutch - "ij" being a common spelling of the single letter IJ

MSalters
A: 

Which exactly package and folder in it (http://unicode.org/Public/cldr/1.8.1/) contain the files with language information, including the alphabet? I tried quite a few, including core/common/main, but there are no alphabets in those files.

Dimitar Dobrev