views:

453

answers:

9

I have a list of strings that can contain a letter or a string representation of an int (max 2 digits). They need to be sorted either alphabetically or (when it is actually an int) on the numerical value it represents.

Example:

IList<string> input = new List<string>()
    {"a", 1.ToString(), 2.ToString(), "b", 10.ToString()};

input.OrderBy(s=>s)
  // 1
  // 10
  // 2
  // a
  // b

What I would want is

  // 1
  // 2
  // 10
  // a
  // b

I have some idea involving formatting it with trying to parse it, then if it is a successfull tryparse to format it with my own custom stringformatter to make it have preceding zeros. I'm hoping for something more simple and performant.

Edit
I ended up making an IComparer I dumped in my Utils library for later use.
While I was at it I threw doubles in the mix too.

public class MixedNumbersAndStringsComparer : IComparer<string> {
    public int Compare(string x, string y) {
        double xVal, yVal;

        if(double.TryParse(x, out xVal) && double.TryParse(y, out yVal))
            return xVal.CompareTo(yVal);
        else 
            return string.Compare(x, y);
    }
}

//Tested on int vs int, double vs double, int vs double, string vs int, string vs doubl, string vs string.
//Not gonna put those here
[TestMethod]
public void RealWorldTest()
{
    List<string> input = new List<string>() { "a", "1", "2,0", "b", "10" };
    List<string> expected = new List<string>() { "1", "2,0", "10", "a", "b" };
    input.Sort(new MixedNumbersAndStringsComparer());
    CollectionAssert.AreEquivalent(expected, input);
}
+1  A: 

I'd say you could split up the values using a RegularExpression (assuming everything is an int) and then rejoin them together.

//create two lists to start
string[] data = //whatever...
List<int> numbers = new List<int>();
List<string> words = new List<string>();

//check each value
foreach (string item in data) {
    if (Regex.IsMatch("^\d+$", item)) {
        numbers.Add(int.Parse(item));
    }
    else {
        words.Add(item);
    }
}

Then with your two lists you can sort each of them and then merge them back together in whatever format you want.

Hugoware
Yeah, this is simpler than my approach. +1
Jonathan
+3  A: 

Use the other overload of OrderBy that takes an IComparer parameter.

You can then implement your own IComparer that uses int.TryParse to tell if it's a number or not.

Christian Hayter
A: 
public static int? TryParse(string s)
{
    int i;
    return int.TryParse(s, out i) ? (int?)i : null;
}

// in your method
IEnumerable<string> input = new string[] {"a", "1","2", "b", "10"};
var list = input.Select(s => new { IntVal = TryParse(s), String =s}).ToList();
list.Sort((s1, s2) => {
    if(s1.IntVal == null && s2.IntVal == null)
    {
        return s1.String.CompareTo(s2.String);
    }
    if(s1.IntVal == null)
    {
        return 1;
    }
    if(s2.IntVal == null)
    {
        return -1;
    }
    return s1.IntVal.Value.CompareTo(s2.IntVal.Value);
});
input = list.Select(s => s.String);

foreach(var x in input)
{
    Console.WriteLine(x);
}

It still does the conversion, but only once/item.

Jonathan
+6  A: 

Two ways come to mind, not sure which is more performant. Implement a custom IComparer:

class MyComparer : IComparer<string>
{
    public int Compare(string x, string y)
    {
        int xVal, yVal;
        var xIsVal = int.TryParse( x, out xVal );
        var yIsVal = int.TryParse( y, out yVal );

        if (xIsVal && yIsVal)   // both are numbers...
            return xVal.CompareTo(yVal);
        if (!xIsVal && !yIsVal) // both are strings...
            return x.CompareTo(y);
        if (xIsVal)             // x is a number, sort first
            return -1;
        return 1;               // x is a string, sort last
    }
}

var input = new[] {"a", "1", "10", "b", "2", "c"};
var e = input.OrderBy( s => s, new MyComparer() );

Or, split the sequence into numbers and non-numbers, then sort each subgroup, finally join the sorted results; something like:

var input = new[] {"a", "1", "10", "b", "2", "c"};

var result = input.Where( s => s.All( x => char.IsDigit( x ) ) )
                  .OrderBy( r => { int z; int.TryParse( r, out z ); return z; } )
                  .Union( input.Where( m => m.Any( x => !char.IsDigit( x ) ) )
                               .OrderBy( q => q ) );
LBushkin
Your IComparer doesn't return non-numeric strings in the correct (alphabetical) order. Your LINQ query does.
LukeH
Yes, thanks, I'll fix that.
LBushkin
I added my ending code in the OP. Also noticed the string thing. Furthermore I tried shortcirquiting before every parse. Don't know if it makes much performance sence, but it took me exactly as much effort to reorder them as it would have taken me to test it ;)
borisCallens
Made code a whole lot shorter. By applying the system of short cirquiting (literally translated from Dutch "Kortsluitingsprincipe") I only do as much tryparses as needed.
borisCallens
A: 

You could use a custom comparer - the ordering statement would then be:

var result = input.OrderBy(s => s, new MyComparer());

where MyComparer is defined like this:

public class MyComparer : Comparer<string>
{
    public override int Compare(string x, string y)
    {

        int xNumber;
        int yNumber;
        var xIsNumber = int.TryParse(x, out xNumber);
        var yIsNumber = int.TryParse(y, out yNumber);

        if (xIsNumber && yIsNumber)
        {
            return xNumber.CompareTo(yNumber);
        }
        if (xIsNumber)
        {
            return -1;
        }
        if (yIsNumber)
        {
            return 1;
        }
        return x.CompareTo(y);
    }
}

Although this may seem a bit verbose, it encapsulates the sorting logic into a proper type. You can then, if you wish, easily subject the Comparer to automated testing (unit testing). It is also reusable.

(It may be possible to make the algorithm a bit clearer, but this was the best I could quickly throw together.)

Mark Seemann
A: 

You could also "cheat" in some sense. Based on your description of the problem, You know any String of length 2 will be a number. So just sort all the Strings of length 1. And then sort all the Strings of length 2. And then do a bunch of swapping to re-order your Strings in the correct order. Essentially the process will work as follows: (assuming your data is in an array.)

Step 1: Push all Strings of length 2 to the end of the array. Keeping track of how many you have.

Step 2: In place sort the Strings of length 1 and Strings of length 2.

Step 3: Binary search for 'a' which would be on the boundary of your two halves.

Step 4: Swap your two digit Strings with the letters as necessary.

That said, while this approach will work, does not involve regular expressions, and does not attempt to parse non-int values as an int -- I don't recommend it. You'll be writing significantly more code than other approaches already suggested. It obfuscates the point of what you are trying to do. It doesn't work if you suddenly get two letter Strings or three digit Strings. Etc. I'm just including it to show how you can look at problems differently, and come up with alternative solutions.

Rob Rolnick
+1  A: 

Perhaps you could go with a more generic approach and use a natural sorting algorithm such as the C# implementation here.

Nathan Baulch
cool. would have used it if knewn before :P
borisCallens
Very cool indeed, I just found a Delphi wrapper for this too http://irsoft.de/web/strnatcmp-and-natsort-for-delphi
Peter Turner
A: 

You could just use function provided by the Win32 API:

[DllImport ("shlwapi.dll", CharSet=CharSet.Unicode, ExactSpelling=true)]
static extern int StrCmpLogicalW (String x, String y);

and call it from an IComparer as others have shown.

Skizz

Skizz
A: 

Use a Schwartzian Transform to perform O(n) conversions!

private class Normalized : IComparable<Normalized> {
  private readonly string str;
  private readonly int val;

  public Normalized(string s) {
    str = s;

    val = 0;
    foreach (char c in s) {
      val *= 10;

      if (c >= '0' && c <= '9')
        val += c - '0';
      else
        val += 100 + c;
    }
  }

  public String Value { get { return str; } }

  public int CompareTo(Normalized n) { return val.CompareTo(n.val); }
};

private static Normalized In(string s) { return new Normalized(s); }
private static String Out(Normalized n) { return n.Value; }

public static IList<String> MixedSort(List<String> l) {
  var tmp = l.ConvertAll(new Converter<String,Normalized>(In));
  tmp.Sort();
  return tmp.ConvertAll(new Converter<Normalized,String>(Out));
}
Greg Bacon
Not really simpler then what I posted for all I know. Could be more performant, but it's not critical enough to put perf over simplicity
borisCallens