views:

10

answers:

0

The default Unicode collation element table defines four-level weight elements for Unicode characters, where the first three levels define the essential part of the sort order and the fourth level is essentially the character code, which is used for tie-breaking.

The section on variable weighting defines the "shifted" option (the default option), which defines a different fourth level based on the first three levels. This is used to sort punctuation characters.

How should these be combined? Should comparison be done with five levels, where the fourth level is the fourth generated by the shifted option, and the fifth is the character code as the tie-breaker?

The Unicode Technical Standard #10: Unicode Collation Algorithm doesn't explicitly specify this, and a web search turned up nothing.