I have a list of UTF-8 strings that I want to sort using Enumerable.OrderBy. The strings may contain any number of character sets - e.g., English, German, and Japanese, or a mix of them, even.
For example, here is a sample input list:
["東京","North 東京", "München", "New York", "Chicago", "大阪市"]
I am confused as to whether using StringComparer.CurrentCulture is the right string comparison parameter to pass to OrderBy()
. What if the current culture of the application is en-US
but I still want to sort UTF-8 data "correctly" beyond just en-US
sorting rules?
My confusion probably stems from my understanding of the NLSSORT function in Oracle that doesn't quite match up with .NET string comparison and sorting semantics. For example, setting NLS_SORT=Japanese_M means it would sort Latin, Western European, and Japanese correctly, regardless of whether any or all of the characters occur in a given string in the sortable column.