views:

4197

answers:

8

Hi,

I'm installing a new SQL Server 2008 server and are having some problems getting any usable information regarding different collations. I have searched SQL Server BOL and google'ed for an answer but can't seem to be able to find any usable information.

  1. What is the difference between the Windows Collation "Finnish_Swedish_100" and "Finnish_Swedish"?

    I suppose that the "_100"-version is a updated collation in SQL Server 2008, but what things have changed from the older version if that is the case?

  2. Is it usually a good thing to have "Accent-sensitive" enabled? I know that it depends on the task and all that, but is there any well-known pros and cons to consider?

  3. The "Binary" and "Binary-code point" parameters, in which cases should theese be enabled?

+1  A: 

To adress your question 1. Accent sensitive is a good thing to have enabled for Finnish-Swedish. Otherwise your "å"s and "ä"s will be sorted as "a"s and "ö"s as "o"s. (Assuming you will be using those kind of international characters).

More here: http://msdn.microsoft.com/en-us/library/ms143515.aspx (discusses both binary codepoint and accent sensitivity)

Dan Sydner
Ah, ok! That was quite good to know. Thanks! :)
Octadrone
@Octadrone: Since you probably can tell: What *is* the expected sort order for accented characters in Sweden? Does "å" sort separately or does it mix up with other forms of the letter "a"?
Tomalak
The expected sort order is [...] x y z å ä ö. All different. However, 'v' and 'w' are sorted as the same letter.
Jonas Lincoln
-1: Not true, rings and umlauts are not considered accents in the Finnish/Swedish collation.
Blixt
A: 

Hi,

To address qestion 2:

Yes, if accent's are required grammer for the given language.

Cheers,John

John Sansom
+1  A: 

The _100 indicates a collation sequence new in SQL Server 2008, those with _90 are for 2005 and those with no suffix are 2000. I don't know what the differences are, and can't find any documentation. Unless you are doing linked server queries to another SQL server of a different version, I'd be tempted to go with the _100 one. Sorry I can't help with the differences.

Miles D
Ok, thanks for the info. I've decided to go for the collation "Finnish_Swedish_100_CI_AS" as the database will be used with a new application beeing developed.
Octadrone
+1  A: 

To address question 3 (info taken off the MSDN; wording theirs, format mine):

Binary (_BIN):

  • Sorts and compares data in SQL Server tables based on the bit patterns defined for each character.
  • Binary sort order is case-sensitive and accent-sensitive.
  • Binary is also the fastest sorting order.
  • If this option is not selected, SQL Server follows sorting and comparison rules as defined in dictionaries for the associated language or alphabet.

Binary-code point (_BIN2):

  • For Unicode data: Sorts and compares data in SQL Server tables based on Unicode code points.
  • For non-Unicode data: will use comparisons identical to binary sorts.

The advantage of using a Binary-code point sort order is that no data resorting is required in applications that compare sorted SQL Server data. As a result, a Binary-code point sort order provides simpler application development and possible performance increases.

For more information, see Guidelines for Using BIN and BIN2 Collations.

Tomalak
A: 

On Questions 2 and 3

Accent Sensitivity is something I would suggest turning OFF if you are accepting user data, and ON if you have clean, sanitized data. Not being Finnish myself, I don't know how many words there are that are different depending on the ó ô õ or ö that they have in them. But if there are users entering data, you can be sure that they will NOT be consistent in their usage, and you want to be able to match them. If you are gathering data from a dataset that you know the content of, and know the consistency of, then you will want to turn Accent Sensitivity ON because you know that the differences are purposeful.

The same questions apply when considering Question 3. (I'm mostly getting this from the link Tomalak provided) If the data is case and accent sensitive, then you want _BIN, because it will sort faster. If the data is irregular, and not case/accent sensitive, then you will want _BIN2, because it is designed for Unicode data.

Being swedish myself I could inform you that the letters åäö is very often used in our language. So in most cases you problable want to be able to sort them correctly.
Octadrone
I apologize for my ignorance, however, if the 'ö' is not available, would you simply use an 'o' or does that completely change the word?
In most cases it will just create a word that really just mean anything with an ö instead of o. If that also is used in a context I believe Swedes would have no problems of understanding the meaning. Though, it would be expected by users to be able to use åäö in a swedish system. :)
Octadrone
+1  A: 

please view my blog at http://blogs.msdn.com/qingsongyao/. Please send my comments and feedbacks if you have more question related to collation.

Thanks Qingsong! I've just taken a quick look on your blog, it sure looks like interesting reading!
Octadrone
+1  A: 

The letters ÅÄÖ/åäö do not mix up with A and O just by setting the collation to AI (Accent Insensitive). That is however true for â and other "combinations" not part of the Swedish alphabet as individual letters. â will mix or not mix depending of the setting in question.

Since I have a lot of old databases I still need to communicate with, also using linked servers, I chose FINNISH _SWEDISH _CI _AS now that I'm installing SQL2008. That was the default setting for FINNISH _SWEDISH when the Windows collations first appeared in SQL Server.

Andreas
+1  A: 

Use the query below to try it out yourself.

As you can see, å, ä, etc. do not count as accented characters, and are sorted according to the Swedish alphabet when using the Finnish/Swedish collation.

However, the accents are only considered if you use the AS collation. For the AI collation, their order is unchanged, as if there was no accent at all.

CREATE TABLE #Test (
    Number int identity,
    Value nvarchar(20) NOT NULL
);
GO

INSERT INTO #Test VALUES ('àá');
INSERT INTO #Test VALUES ('áa');
INSERT INTO #Test VALUES ('aa');
INSERT INTO #Test VALUES ('aà');

INSERT INTO #Test VALUES ('áb');
INSERT INTO #Test VALUES ('ab');

-- w is considered an accented version of v
INSERT INTO #Test VALUES ('wa');
INSERT INTO #Test VALUES ('va');
INSERT INTO #Test VALUES ('zz');
INSERT INTO #Test VALUES ('åä');
GO

SELECT Number, Value FROM #Test ORDER BY Value COLLATE Finnish_Swedish_CI_AS;
SELECT Number, Value FROM #Test ORDER BY Value COLLATE Finnish_Swedish_CI_AI;
GO

DROP TABLE #Test;
GO
Blixt