views:

65

answers:

1

HI There,

i am developing for European languages and also for English, the string are stored as NVARCHAR in sql server 2005.

so, which is the best collation to be used is "Latin1_General_CI_AS" covers all? there are variations as well like Latin1_General_CP1_CI_AS,Latin1_General_BIN,Latin1_General_BIN2 etc

comments\suggestions appreciated.

Regards DEE

+1  A: 

For general purpose sorting "General Latin1" is probably the best choice for western European and English languages.

I believe that if the code page (e.g., CP1) is not specified, then it defaults to code page 1252 (which is also what CP1 signifies). So my understanding is that Latin1_General_CI_AS and Latin1_General_CP1_CI_AS are equivalent. Given that, my opinion is that Latin1_General_CP1_CI_AS would be the better choice for clarity reasons. Whether you use CI_AS, CS_AS, or CI_AI is purely a usability issue based on whether you want case sensitivity and/or accent sensitivity. With CI, "a" == "A" and with AI, "á" == "â".

The _BIN and _BIN2 options signify that the collation will be binary based on the code point values. For sorting purposes, you probably do not want that because the order would not necessarily match any kind of dictionary order. However, if you are only using the index for searching for data, then one of those might be appropriate because it could be faster. Relatively little computation is necessary to convert a character value to the associated key value.

Edit As Martin points out in the comment, the code page will not matter unless you are using char, memo, or varchar. If you stick completely with Unicode (nchar, nvarchar, nmemo), then the code page will not come into play. If you translate a Unicode character to a single-byte character, though, it will be used.

Mark Wilkins
+1 Just to mention to avoid any confusion that the code page applies to `CHAR` representations rather than `NVARCHAR`.
Martin Smith
@Martin, That's a good point. I'll add that.
Mark Wilkins
Thanks Mark , this was really helpfull
DEE