views:

175

answers:

5

What I would like to do is to sort a list of textual identifiers (think of a file name for example). The sorting algorithm I'm looking for is a kind of alphabetic sorting, but taking groups into account.

For example, sorting "D1" to "D21" should be:

D1, D2, D3, ..., D21

And not:

D1, D10, D11, D12, ... D2, D20, D21, D3, ...

I've been a lot of time trying to accomplish such way of sorting but still can't find how to do it. The bigger the group the harder it seems.

Is there anyone that could guide me through or give me some pseudo code or code in any language?

Thanks.

+1  A: 

The Unix sort utility has a 'numeric sort' option -n which sorts the way you want, rather than a plain dictionary sort.

Maybe look for information about 'numeric sort'

The Unix source code of sort will be available, but probably difficult to follow.

pavium
Sure! Thanks! I'll take a look at it and see how it does.
Shantia
+3  A: 
  • Detect the position the strings differ first.
  • If both values at that position are numbers:
    • read forward, one character at at time, until you find a non-number in either string.
    • If only one string is non-numeric, it is "smaller"
    • If Both strings are non-numeric at the same point, compare numbers at the earlier detected position
  • Otherwise
    • compare alphabetically the letters at the position
gnarf
Thanks! That easy and I've spent a lot of time trying to make it.
Shantia
Updated it to be a little lighter on memory usage, there is no need to store or capture.
gnarf
+2  A: 

Jeff has a post on this with links to several implementations. It appears to be called "natural sorting".

See also this question.

John Fouhy
+1  A: 

PHP: the nat*sort family of functions (natsort, natcasesort, ...)

Perl:

sub natcmp {
    # from http://www.perlmonks.org/?node_id=540890
    my @a = split /(\d+)/, (shift or $a);
    my @b = split /(\d+)/, (shift or $b);

    my $last = min(scalar @a, scalar @b)-1;
    my $cmp;
    for my $i (0 .. $last) {
     unless($i & 1) {  # even
      $cmp = lc $a[$i] cmp lc $b[$i] || $a[$i] cmp $b[$i] and return $cmp;
     } else {  # odd
      $cmp = $a[$i] <=> $b[$i] and return $cmp;
     }
    }
    return scalar @a <=> scalar @b;  # shortest array comes first
}

# ...

@sorted_array = sort natcmp @array;

What you're looking for is called a natural sort.

jnylen
+1  A: 

Others have pointed out that it's called "natural sort". For code, this Python implementation is the simplest I've found. It basically uses a regex to split each string into heterogeneous lists of ints or strings, and then compares those.

This takes advantage of a couple of Python features w.r.t. object comparison. First, that you can compare ints and strings directly (ints sort as less-than any string). Second, that you can compare heterogeneous lists (it compares elements from the beginning until it finds a difference).

Ken