T-SQL Unicode "word" definition

views:

236

answers:

+2 Q:

T-SQL Unicode "word" definition

I'm new to Unicode in Microsoft SQL Server 2005 / 2008. I converted my DB to use NVarChar() instead of VarChar(). I found to my surprise that the sorting is different than with VarChar(). I found this other reference here on StackOverflow, for SQL Sorting and hyphens that explained Unicode sorting is done on a "word" basis. After more research, I found the Unicode Consortium site (www.unicode.org), in particular the Unicode Text Segmentation (www.unicode.org/reports/tr29) site that discusses this, and it does mention the hyphen as a special case. (Sorry, as a new user, I couldn't post hyperlinks for these).

But what I'm trying to define is exactly what the rules are for the different collations, in particular for US English collations. What other special cases are there? For example, is hyphen the only character that's ignored? Or what about other punctuation, like apostrophes?

Any links or pointers will be greatly appreciated.

+1 A:

Don't use a SQL collation; use a Windows one. This is mentioned in the KB article.

From "Windows Collation Sorting Styles":

For Windows collations, the nchar, nvarchar, and ntext Unicode data types have the same sorting behavior as char, varchar, and text non-Unicode data types.

However, you should also consider why you have unicode. In addition to your sorting issues,it's slower: varchar vs nvarchar performance and even MS agreee

gbn 2009-06-25 04:33:37

I understand about Windows collations versus SQL collations. What I'm trying to find out is just exactly what the rules are for the Windows collations (the non-BIN flavors). The hyphen gets ignored, and I'm trying to find a definition of other rules like that. Also, you do have a point about speed, but that's a story for another day. In practice, we haven't seen a performance hit for our configuration.

PaulR 2009-06-25 20:09:09

ansaurus

tags:

views:

answers:

T-SQL Unicode "word" definition

related questions