views:

154

answers:

5

I am using structures in my programming and I sort the structure according to a value in the structure using IComparer.

How did Microsoft implement the Array.Sort() method? Is there any documentation (references) for this? Is it the same for all types of Sort() in Visual Basic?

This is a simple example for what I want.

Dim MyArray(6) As Integer
    MyArray(0) = 1
    MyArray(1) = 45
    MyArray(2) = 45
   ' Some Code.....
    '.........
    '..........
    MyArray(3) = 1
    MyArray(4) = 10
    ' Some Code.....
    '.........
    '..........
    MyArray(5) = 1
    MyArray(6) = 57

    Array.Sort(MyArray)

Array.Sort() will sort this array as: (1 1 1 10 45 45 57)

How does number 1 get sorted? Is it bringing to the end the first one or keeps the old one in the same index?

In my original example (before sorting), MyArray(0) = 1 and after sorting MyArray(0) = 1.

Is this the same original 1 or this another 1 (the newest one added to the array) moved to that position?

In case the MyArray(0) = 1 after sorting should be MyArray(5) = 1 before sorting.

+5  A: 

Use .Net Reflector and see for yourself... From the method names it looks like they are using the QuickSort algorithm: System.Array+SorterObjectArray.QuickSort

David
+5  A: 

Array.Sort(), like most built-in sorters, uses a QuickSort implementation in a helper class behind the scenes. The sort is relatively efficient, and customizable using the IComparable and IComparer interfaces, but it's unstable; the three 1s in your example may end up in a different relative order than they were before the sort. You can see this if you use a more complex structure:

struct TestStruct
{
   int a;
   int b;
}

...

//As declared, this array is already sorted by both "a" and "b" properties
var myStructAray = new [] {new TestStruct{a=1,b=1}, new TestStruct{a=1,b=2}, new TestStruct{a=1,b=3});

//QuickSorts myStructArray based on the comparison of the lambda for each element
var newArray = Array.Sort(myStructArray, x=>x.a); 

//newArray may have a different order as myStructArray at this time
for(var i=0;i<myStructArray.Count();i++)
{
   //NUnit assertion; will almost surely fail given a sufficient array length
   Assert.AreEqual(myStructArray[i].b, newArray[i].b);
}
KeithS
+7  A: 

It uses the Quicksort algorithm, which is not stable when implemented efficiently (in place). Meaning that it doesn't guarantee that values which are equal retain their prior relative position after sorting.

For example, if you have a bunch of points:

Point[] points = new Point[]
{
   new Point(0, 1),
   new Point(0, 2),
   new Point(0, 3),
   new Point(1, 1),
   new Point(1, 2),
   new Point(1, 3)
};

And you sort these points by x-coordinate only, using this comparer:

private int CompareByX(Point a, Point b)
{
    return a.X - b.X;
}

It will only guarantee that the points are sorted by their x-coordinate, meaning you could easily end up with a mixed up order (when looking at the y-coordinate):

Point(0, 3)
Point(0, 2)
Point(0, 1)
Point(1, 3)
Point(1, 2)
Point(1, 1)

[Edit]

This doesn't mean that the sorting algorithm is non-deterministic (random). For same input data, you will get same output data on each run. You can also predict the actual way it will be reorganized if you examine the algorithm precisely, but it is unnecessary. It is sufficient just to know that this happens when using the sort routine.

Here is a working example for your problem, try changing the test data sizes (first line in Main) and watch how the array gets reorganized on each run:

class Program
{
    static void Main()
    {
        Point[] points = CreateTestData(1, 4).ToArray();
        DisplayItems("Before", points);
        Array.Sort(points, CompareByX);
        DisplayItems("After", points);
        Console.ReadLine();
    }

    private class Point
    {
        public int X { get; private set; }
        public int Y { get; private set; }
        public override string ToString()
        { return string.Format("({0},{1})", X, Y); }
        public Point(int x, int y)
        { X = x; Y = y; }
    }

    private static int CompareByX(Point a, Point b)
    { return a.X - b.X; }

    private static IEnumerable<Point> CreateTestData(int maxX, int maxY)
    {
        for (int x = 0; x <= 1; x++)
            for (int y = 0; y <= 4; y++)
                yield return new Point(x, y);
    }

    private static void DisplayItems(string msg, Point[] points)
    {
        Console.WriteLine(msg);
        foreach (Point p in points)
            Console.WriteLine(p.ToString());
        Console.WriteLine();
    }
}

Of course, if you extend the comparer delegate to include the Y coordinate, you will not have this problem:

    private static int CompareByX(Point a, Point b)
    {
         if (a.X == b.X) 
            return a.Y - b.Y;
         else
            return a.X - b.X;
    }
Groo
I understand from your answer that the nature of sorting is unknown!?
Aharoun Baalan
Is the sorting is Random?
Aharoun Baalan
No, it isn't random, because the pivot point for quicksort's partitioning is always taken from the same place (middle of the current partition). So you will get you values mixed up the same way every time.
Groo
+3  A: 

First of all, let's address several issues in your current plan with regards to best practices for .Net (VB or C#):

  1. Prefer Class over Structure unless you have a good reason to do otherwise
  2. Avoid using Arrays
  3. You can build that array as a one-liner: Dim MyArray() As Integer = {1, 45, 45, 1, 10, 1, 57}

As to your question of whether it's the "same" value 1, the answer is that it depends on how you look at it. For the general case, the answer is whether or not the sorting algorithm is considered stable. .Net's sorting algorithm in not stable.

For this specific case, you're asking the wrong question. 1 is 1 is 1. There is no distinction between them. If you feel like it matters, I challenge you to provide code to detect a difference between any two of the "1s" from that list in your original code (aside from array index).

Joel Coehoorn
+1. But the question makes sense, IMHO. Yes, there is no difference between 1 and 1, but it will make difference when the sort algorithm affects more complex objects, or objects or different derived types.
MainMa
FYI -- your VB code is invalid syntax. Array initialization is not legal when explicit bounds are specified. Take the '6' out to make it valid.
Dustin Campbell
Opps, you're right. Corrected.
Joel Coehoorn
+6  A: 

Array.Sort is an unstable sort, so the order of elements which are the same is undefined and not conserved. The article on Array.Sort in MSDN states:

This method uses the QuickSort algorithm. This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.

LINQ's OrderBy methods on the other hand are stable. The article on OrderBy in the MSDN states:

This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved. In contrast, an unstable sort does not preserve the order of elements that have the same key.

CodeInChaos