ansaurus

Question

Improve performance of sorting files by extension

Answer 1

+1 A:

Create a new array that contains each of the filenames in ext.restofpath format (or some sort of pair/tuple format that can default sort on the extension without further transformation). Sort that, then convert it back.

This is faster because instead of having to retrieve the extension many times for each element (since you're doing something like N log N compares), you only do it once (and then move it back once).

Amber 2010-05-20 21:21:47

Answer 2

+1 A:

You can write a comparer that compares each character of the extension. char has a CompareTo(), too (see here).

Basically you loop until you have no more chars left in at least one string or one CompareTo() returns a value != 0.

EDIT: In response to the edits of the OP

The performance of your comparer method can be significantly improved. See the following code. Additionally I added the line

string.Compare( filePath1[i].ToString(), filePath2[j].ToString(), 
                m_CultureInfo, m_CompareOptions );

to enable the use of CultureInfo and CompareOptions. However this slows down everything compared to a version using a plain char.CompareTo() (about factor 2). But, according to my own SO question this seems to be the way to go.

public sealed class ExtensionComparer : IComparer<string>
{
    private readonly CultureInfo m_CultureInfo;
    private readonly CompareOptions m_CompareOptions;

    public ExtensionComparer() : this( CultureInfo.CurrentUICulture, CompareOptions.None ) {}

    public ExtensionComparer( CultureInfo cultureInfo, CompareOptions compareOptions )
    {
        m_CultureInfo = cultureInfo;
        m_CompareOptions = compareOptions;
    }

    public int Compare( string filePath1, string filePath2 )
    {
        if( filePath1 == null || filePath2 == null )
        {
            if( filePath1 != null )
            {
                return 1;
            }
            if( filePath2 != null )
            {
                return -1;
            }
            return 0;
        }

        var i = filePath1.LastIndexOf( '.' ) + 1;
        var j = filePath2.LastIndexOf( '.' ) + 1;

        if( i == 0 || j == 0 )
        {
            if( i != 0 )
            {
                return 1;
            }
            return j != 0 ? -1 : 0;
        }

        while( true )
        {
            if( i == filePath1.Length || j == filePath2.Length )
            {
                if( i != filePath1.Length )
                {
                    return 1;
                }
                return j != filePath2.Length ? -1 : 0;
            }
            var compareResults = string.Compare( filePath1[i].ToString(), filePath2[j].ToString(), m_CultureInfo, m_CompareOptions );
            //var compareResults = filePath1[i].CompareTo( filePath2[j] );
            if( compareResults != 0 )
            {
                return compareResults;
            }
            i++;
            j++;
        }
    }
}

Usage:

fileNames1.Sort( new ExtensionComparer( CultureInfo.GetCultureInfo( "sv-SE" ),
                    CompareOptions.StringSort ) );

tanascius 2010-05-20 21:26:25

Answer 3

A:

the main problem here is that you are calling Path.GetExtension multiple times for each path. if this is doing a quicksort then you could expect Path.GetExtension to be called anywhere from log(n) to n times where n is the number of items in your list for each item in the list. So you are going to want to cache the calls to Path.GetExtension.

if you were using linq i would suggest something like this:

filenames.Select(n => new {name=n, ext=Path.GetExtension(n)})
         .OrderBy(t => t.ext).ToArray();

this ensures that Path.GetExtension is only called once for each filename.

luke 2010-05-20 21:26:41

Answer 4

+1 A:

Not the most memory efficient but the fastest according to my tests:

SortedDictionary<string, List<string>> dic = new SortedDictionary<string, List<string>>();
foreach (string fileName in fileNames)
{
   string extension = Path.GetExtension(fileName);
   List<string> list;
   if (!dic.TryGetValue(extension, out list))
   {
      list = new List<string>();
      dic.Add(extension, list);
   }
   list.Add(fileName);
}
string[] arr = dic.Values.SelectMany(v => v).ToArray();

Did a mini benchmark on 800k randomly generated 8.3 filenames:

Sorting items with Linq to Objects... 00:00:04.4592595

Sorting items with SortedDictionary... 00:00:02.4405325

Sorting items with Array.Sort... 00:00:06.6464205

Julien Lebosquain 2010-05-20 21:45:07

Are you sure about `list` declaration and assigment? Will if fail if `TryGetValue` returns `true`?

abatishchev 2010-05-20 22:46:53

If the extension is already added, `TryGetValue` will return true and list will be set so we're good to add an item to it, otherwise we simply create a new list and add it to the dictionary.

Julien Lebosquain 2010-05-21 06:07:54

ansaurus

tags:

views:

answers:

Improve performance of sorting files by extension

related questions