views:

78

answers:

2

I have a list of UTF-8 strings that I want to sort using Enumerable.OrderBy. The strings may contain any number of character sets - e.g., English, German, and Japanese, or a mix of them, even.

For example, here is a sample input list:

["東京","North 東京", "München", "New York", "Chicago", "大阪市"]

I am confused as to whether using StringComparer.CurrentCulture is the right string comparison parameter to pass to OrderBy(). What if the current culture of the application is en-US but I still want to sort UTF-8 data "correctly" beyond just en-US sorting rules?

My confusion probably stems from my understanding of the NLSSORT function in Oracle that doesn't quite match up with .NET string comparison and sorting semantics. For example, setting NLS_SORT=Japanese_M means it would sort Latin, Western European, and Japanese correctly, regardless of whether any or all of the characters occur in a given string in the sortable column.

+2  A: 

There is no one comparison which works for all cultures.

Short of detecting the language and choosing accordingly, InvariantCulture is your best bet. As the document you link notes:

DON'T: Use StringComparison.InvariantCulture-based string operations in most cases; one of the few exceptions would be persisting linguistically meaningful but culturally-agnostic data.

I added the emphasis. That exception is more or less what you're doing.

Craig Stuntz
You're saying InvariantCulture is my best bet, but then negate it with the quote from MSDN, in which case, you agree with nobugz, and suggest that I use CurrentCulture. Is that correct?
Mike Atlas
No. As I said, what you are doing seems to fall into the exception, so I think the MSDN quote supports using InvariantCulture in this specific case. nobugz would be correct if the local culture predominates, but if the list is truly mixed as you say *and* is meaningful to the (presumably, polyglot) user, then InvariantCulture is the best choice.
Craig Stuntz
+1  A: 

Keep your eyes on the ball: you sort to help humans find back a string in a list. You'll need a skilled linguist to know the sorting rules for English, German and Japanese at the same time. What are the odds of one laying eyes on your list? Always make sure the list is sorted according to the local culture rules and that sorting is localized.

Hans Passant