ansaurus

Question

C#: get letters of alphabet for scandinavian language?

Answer 1

+1 A:

I don't think it's accessible programmatically by default but here's a good set of reference documents at the Evertype website

Lazarus 2009-10-09 09:39:53

+1 as while I'd recommend the CLDR where applicable, and the CLDR covers a larger number of languages, Michael's set is good in having rules about semi-included letters in alphabets (e.g. in English we don't normally include æ in the alphabet, at least as far as Modern English goes, but it is properly sorted as if it were "ae", while ȝ is also no longer included in the alphabet but sorts between y and z). Likewise with letters that are commonly used in loan words in some languages and those that have changed alphabets. Not often necessary to know, but a great source when it is.

Jon Hanna 2010-10-27 22:58:22

Answer 2

+3 A:

To the best of my knowledge, neither .NET nor Windows provide this information. However, you can find it in the Unicode Consortium's CLDR database. This DB is actually a set of XML files (one for each language. named after the language abbreviation) containing all sorts of localisation info. A gold mine!

the element /ldml/characters/examplarCharacters contains a list of characters used in the language. e.g., for Swedish (sv.xml):

[a-v x-z å ä ö]

Note that when you say 'ASCII letters', you do realize you're limiting yourself to the Latin script, don't you? As far as ar the CLDR is concerned, lists such as a-z are Unicode characters sequences, not just ASCII letters. eg, in Russian (from ru.xml):

[а-е ё ж-я]

Serge - appTranslator 2009-10-09 11:29:24

Careful! The CLDR does not really contains the required information. For example, for Dutch the element /ldml/characters/exemplarCharacters is [a á ä b-e é ë f-i í ï {ij} j-o ó ö p-u ú ü v-z] (yikes!) But trust me, [a-z] is what you need for this question.

Ruben 2009-10-09 12:01:41

No, I don't trust you ;-) Even though accented letters are not in common use in the language itself, letters such as é are not rare in Dutch names (not sure for the Netherlands. But sure for Flemishes (Dutch speaking belgians. nl-BE). And according to www.voornamen.com, Björn is currently the most popular Dutch firstname for babies.

Serge - appTranslator 2009-10-09 12:35:42

True, but that's irrelevant to the question. Accented letters are used, but they are never regarded as separate letters for alphabetizing, unlike å ä and ö in Swedish. Filtering on [a] in Dutch *must* include á à ä etc. (Just check a dictionary.)

Ruben 2009-10-14 08:54:42

The exemplar characters in CLDR, type="index" are what you are looking for. http://unicode.org/reports/tr35/#Character_Elements

Steven R. Loomis 2010-05-14 18:15:10

Answer 3

A:

So to clean this one up, i think the answer is that even if i limit myself to western languages i can't ask the .Net framework for the alphabet letters. So i've made a list myself with the letters, luckily there were only four languages to do.

Michel 2009-11-12 14:26:46

Answer 4

+1 A:

You're going to send a list of strings to your translators anyway. For each language of your site, you'll have one, and they'll each know the answer for their languages. So just submit the string "a b c d e f g h i j k l m n o p q r s t u v w x y z" to them, and document this as the alplabet used for paging. They should be able to translate it for you. Mind you, you could get back entries like ..."x ij z" for dutch - "ij" being a common spelling of the single letter Ĳ

MSalters 2009-11-12 14:46:32

Answer 5

A:

Which exactly package and folder in it (http://unicode.org/Public/cldr/1.8.1/) contain the files with language information, including the alphabet? I tried quite a few, including core/common/main, but there are no alphabets in those files.

Dimitar Dobrev 2010-07-29 08:51:43

ansaurus

tags:

views:

answers:

C#: get letters of alphabet for scandinavian language?

related questions