ansaurus

Question

Writing a better natural sort (than mine)

Answer 1

+2 A:

I don't know much about .NET, but since it's also an algorithmic question, here are my two cents:

I'd try to split the string into tokens, probably using regular expressions. Then you can compare the strings token by token, using an appropriate comparison function depending on the type of token.

More specifically:

Define regular expressions for dates, numbers, words, ... The last of those should be a fallback expression which matches any character.
Try each expression, most specific first, until one matches at the beginning of both strings
Extract the part that matches and compare it using the appropriate comparison function.
In case of equality, remove the match from the beginning of both strings and repeat from step 2.

Using regular expressions, it should also be possible to support unicode, if you do not use [a-zA-Z] but proper character classes like [:alpha:].

As for the comparison of the different forms of È, you can try to normalize the string first.

Jonas Wagner 2010-09-15 12:01:28

That's what I did on the same question: http://stackoverflow.com/questions/3716831/sorting-liststring-in-c/3717211#3717211 . In my opinion, it gives a nice separation - first you figure out the different parts of the token, and sort them on a later stage.

Kobi 2010-09-15 12:08:44

Thanks... I should have looked there before posting!

Jonas Wagner 2010-09-15 15:53:08

@Jonas - you really shouldn't have! `:)`

Kobi 2010-09-15 19:40:15

Answer 2

+4 A:

This is already available in Windows, the shell uses natural sort order when arranging the files in an Explorer window. The comparison function it uses is exported and available to any program, at least since Windows 2000. While P/Invoke isn't the greatest solution, it does have the considerable advantage of having been tested billions of times in the past 10 odd years. And sorting strings in a way that the user is already well familiar with.

Handling diacritics is already part of .NET, the string.Normalize() method takes care of it.

Here's a sample program that uses it, it properly sorts the strings as requested in the original thread:

using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;

class Program {
    static void Main(string[] args) {
        string[] arr = new string[] { "1", "5", "3", "6", "11", "9", "NUM1", "NUM0" };
        Array.Sort(arr, new LogicalComparer());
        foreach (string s in arr) Console.WriteLine(s);
        Console.ReadLine();
    }
}
class LogicalComparer : IComparer<string> {
    public int Compare(string x, string y) {
        return StrCmpLogicalW(x.Normalize(), y.Normalize());
    }
    [DllImport("shlwapi.dll", CharSet = CharSet.Unicode, ExactSpelling = true)]
    private static extern int StrCmpLogicalW(string s1, string s2);
}

Hans Passant 2010-09-15 13:12:50

ansaurus

tags:

views:

answers:

Writing a better natural sort (than mine)

related questions