views:

54

answers:

2

Can someone please explain this:

SELECT 
  CASE WHEN CAST('iX' AS nvarchar(20)) 
      > CAST('-X' AS nvarchar(20)) THEN 1 ELSE 0 END,
  CASE WHEN CAST('iX' AS varchar(20)) 
      > CAST('-X' AS varchar(20)) THEN 1 ELSE 0 END

Results: 0 1

SELECT 
  CASE WHEN CAST('i' AS nvarchar(20)) 
      > CAST('-' AS nvarchar(20)) THEN 1 ELSE 0 END,
  CASE WHEN CAST('i' AS varchar(20)) 
      > CAST('-' AS varchar(20)) THEN 1 ELSE 0 END

Results: 1 1

On the first query, the nvarchar() result is not what I'm expecting, and yet removing the X make the nvarchar() sort happen as expected.

(My original queries used the '' and N'' literal syntax to distinguish varchar() and nvarchar() rather than CAST() and got the same result.)

Collation setting for the database is SQL_Latin1_General_CP1_CI_AS.

+5  A: 

When comparing unicode text, hyphens are treated specially. Unicode comparison uses "dictionary order", which ignores hyphens. This is not the case with non-unicode text comparison.

Comparing -X and iX, is like comparing X and iX, so -X, the left side, is greater. When comparing "-" and "i", is like comparing "" and "i", so "i", the right side is greater.

From MSDN,

A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen.

SELCT body From MSDN_Articles WHERE url IN ("http://support.microsoft.com/kb/322112")

mdma
+1  A: 

A nice question!

Digging around, I found that the issue is related to hyphens and apostrophes. Your example exhibits the same 'odd' behaviour with '''X' as with '-X'.

I can't take credit for finding the answer, because it's here: http://stackoverflow.com/questions/360950/sql-sorting-and-hyphens

Neil Moss