views:

235

answers:

5

If you have strings like:

"file_0"
"file_1"
"file_2"
"file_3"
"file_4"
"file_5"
"file_6"
"file_11"

how can you sort them so that "file_11" doesn't come after "file_1", but comes after "file_6", since 11 > 6.

Do I have to parse the string and convert it into a number for this?

Windows explorer in Win7 sorts files out the way I wanted.

+8  A: 

Do I have to parse the string and convert it into a number for this?

Essentially, yes; but LINQ may help:

var sorted = arr.OrderBy(s => int.Parse(s.Substring(5)));
foreach (string s in sorted) {
    Console.WriteLine(s);
}
Marc Gravell
Thanks Marc. It's certainly cleaner.
Joan Venge
Btw Marc, is there a "till the end of string" to pass to a substring? Otherwise I need to do some calculations which will prevent me from using the dot notation, right?
Joan Venge
The overload above *is* the "till the end of the string"... the 5 is the *start* index.
Marc Gravell
Thanks Marc, forgot that one.
Joan Venge
+1  A: 

A simple way is to pad the numeric portion like so:

file_00001
file_00002
file_00010
file_00011

etc.

But this reles on knowing the maximum value the numeric portion can take.

Mitch Wheat
Thanks. How do you pad the numbers in c#? Do you mean parsing and inserting the number into the string?
Joan Venge
I think Mitch means: try not to start with that data in the first place... change your input to **avoid** the need to process it.
Marc Gravell
Thanks I see. Unfortunately I will not have control over the filenames (on user machines) :)
Joan Venge
+5  A: 

You could import the StrCmpLogicalW function and use that to sort the strings. This is the very same function that Explorer itself uses for file names.

Won't help you if you don't want P/Invoke or stay compatible on other systems, though.

Joey
Thanks alot, interesting idea.
Joan Venge
+1  A: 

I have used the following approach in a project a while ago. It's not particularly efficient, but if the number of items to sort is not huge it performed well enough for that use. What it does is that it splits up the strings to compare into arrays on the '_' character, and then compares each element of the arrays. An attempt is made to parse the last element as an int, and make a numeric comparison there.

It also has an early exit if the input strings would contain a different number of elements (so if you compare "file_nbr_1" to "file_23", it will not go into comparing each part of the strings, but rather just to a regular string comparison on the full strings):

char[] splitChars = new char[] { '_' };
string[] strings = new[] {
    "file_1",
    "file_8",
    "file_11",
    "file_2"
};

Array.Sort(strings, delegate(string x, string y)
{
    // split the strings into arrays on each '_' character
    string[] xValues = x.Split(splitChars);
    string[] yValues = y.Split(splitChars);

    // if the arrays are of different lengths, just 
    //make a regular string comparison on the full values
    if (xValues.Length != yValues.Length)
    {
        return x.CompareTo(y);
    }

    // So, the arrays are of equal length, compare each element
    for (int i = 0; i < xValues.Length; i++)
    {
        if (i == xValues.Length - 1)
        {
            // we are looking at the last element of the arrays

            // first, try to parse the values as ints
            int xInt = 0;
            int yInt = 0;
            if (int.TryParse(xValues[i], out xInt) 
                && int.TryParse(yValues[i], out yInt))
            {
                // if parsing the values as ints was successful 
                // for both values, make a numeric comparison 
                // and return the result
                return xInt.CompareTo(yInt);
            }
        }

        if (string.Compare(xValues[i], yValues[i], 
            StringComparison.InvariantCultureIgnoreCase) != 0)
        {
            break;
        }
    }

    return x.CompareTo(y);

});
Fredrik Mörk
Thanks Fredrick.
Joan Venge
+2  A: 

To handle sorting of intermixed strings and numbers for any kind of format, you can use a class like this to split the strings into string and number components and compare them:

public class StringNum : IComparable<StringNum> {

   private List<string> _strings;
   private List<int> _numbers;

   public StringNum(string value) {
      _strings = new List<string>();
      _numbers = new List<int>();
      int pos = 0;
      bool number = false;
      while (pos < value.Length) {
         int len = 0;
         while (pos + len < value.Length && Char.IsDigit(value[pos+len]) == number) {
            len++;
         }
         if (number) {
            _numbers.Add(int.Parse(value.Substring(pos, len)));
         } else {
            _strings.Add(value.Substring(pos, len));
         }
         pos += len;
         number = !number;
      }
   }

   public int CompareTo(StringNum other) {
      int index = 0;
      while (index < _strings.Count && index < other._strings.Count) {
         int result = _strings[index].CompareTo(other._strings[index]);
         if (result != 0) return result;
         if (index < _numbers.Count && index < other._numbers.Count) {
            result = _numbers[index].CompareTo(other._numbers[index]);
            if (result != 0) return result;
         }
         index++;
      }
      return 0;
   }

}

Example:

List<string> items = new List<string> {
  "item_66b",
  "999",
  "item_5",
  "14",
  "file_14",
  "26",
  "file_2",
  "item_66a",
  "9",
  "file_10",
  "item_1",
  "file_1"
};

items.Sort((a,b)=>new StringNum(a).CompareTo(new StringNum(b)));

foreach (string s in items) Console.WriteLine(s);

Output:

9
14
26
999
file_1
file_2
file_10
file_14
item_1
item_5
item_66a
item_66b
Guffa