views:

570

answers:

6

Hi all,

I'm looking for an algorithm that sorts strings similar to the way files (and folders) are sorted in Windows Explorer. It seems that numeric values in strings are taken into account when sorted which results in something like

name 1, name 2, name 10

instead of

name 1, name 10, name 2

which you get with a regular string comparison.

I was about to start writing this myself but wanted to check if anyone had done this before and was willing to share some code or insights. The way I would approach this would be to add leading zeros to the numeric values in the name before comparing them. This would result in something like

name 00001, name 00010, name 00002

which when sorted with a regular string sort would give me the correct result.

Any ideas?

+3  A: 

It's called "natural sort order". Jeff had a pretty extensive blog entry on it a while ago, which describes the difficulties you might overlook and has links to several implementations.

Michael Borgwardt
+1  A: 

The way I understood it, Windows Explorer sorts as per your second example - it's always irritated me hugely that the ordering comes out 1, 10, 2. That's why most apps which write lots of files (like batch apps) always use fixed length filenames with leading 0's or whatever.

Your solution should work, but you'd need to be careful where the numbers were in the filename, and probably only use your approach if they were at the very end.

xan
+1  A: 

Have a look at

http://www.interact-sw.co.uk/iangblog/2007/12/13/natural-sorting

for some source code.

WOPR
+2  A: 

Explorer uses the API StrCmpLogicalW() for this kind of sorting (called 'natural sort order').

You don't need to write your own comparison function, just use the one that already exists.

A good explanation can be found here.

Stefan
+2  A: 

There is StrCmpLogicalW, but it's only available starting with Windows XP and only implemented as Unicode.

Some backround information: http://blogs.msdn.com/michkap/archive/2006/10/01/778990.aspx

Otherside
+1  A: 

I also posted a related question with additional hints and pitfalls:

Sorting strings is much harder than you thought

Carl Seleborg