I'm new to Unicode in Microsoft SQL Server 2005 / 2008. I converted my DB to use NVarChar() instead of VarChar(). I found to my surprise that the sorting is different than with VarChar(). I found this other reference here on StackOverflow, for SQL Sorting and hyphens that explained Unicode sorting is done on a "word" basis. After more research, I found the Unicode Consortium site (www.unicode.org), in particular the Unicode Text Segmentation (www.unicode.org/reports/tr29) site that discusses this, and it does mention the hyphen as a special case. (Sorry, as a new user, I couldn't post hyperlinks for these).
But what I'm trying to define is exactly what the rules are for the different collations, in particular for US English collations. What other special cases are there? For example, is hyphen the only character that's ignored? Or what about other punctuation, like apostrophes?
Any links or pointers will be greatly appreciated.