views:

455

answers:

5

I have a list of items that have a partial order relation, i. e, the list can be considered a partially ordered set. I want to sort this list in the same way as in this question. As correctly answered there, this is known as topological sorting.

There's a reasonably simple known algorithm to solve the problem. I want a LINQ-like implementation of it.

I already tried to use OrderBy extension method, but I'm quite sure it's not able to make topological sorting. The problem is that the IComparer<TKey> interface is not able to represent a partial order. This happens because the Compare method can return basically 3 kinds of values: zero, negative, and positive, meaning are-equal, is-less-than, and is-greater-then, respectively. A working solution would only be possible if there were a way to return are-unrelated.

From my biased point of view, the answer I'm looking for might be composed by an IPartialOrderComparer<T> interface and an extension method like this:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IPartialOrderComparer<TKey> comparer
);

How would this be implemented? How does the IPartialOrderComparer<T> interface would look like? Would you recommend a different approach? I'm eager to see it. Maybe there's a nicer way to represent the partial order, I don't know.

A: 

Well, I'm not sure that this way of handling it is the best way, but I could be wrong.

The typical way to handle topological sorting is by using a graph, and for each iteration remove all nodes which doesn't have an inbound connection, and simultaneously remove all connections outbound from those nodes. The removed nodes are the output of the iteration. Repeat until you cannot remove any more nodes.

However, in order to get those connections in the first place, with your method, you would need:

  1. A method (your comparer) that could say "this before that" but also "no information for these two"
  2. Iterate all combinations, creating connections for all combinations that the comparer returns an ordering for.

In other words, the method would probably defined like this:

public Int32? Compare(TKey a, TKey b) { ... }

and then return null when it doesn't have a conclusive answer for two keys.

The problem I'm thinking about is the "iterate all combinations" part. Perhaps there's a better way to handle this, but I'm not seeing it.

Lasse V. Karlsen
+6  A: 

I would suggest using the same IComparer interface, but writing the extension method so as to interpret 0 as not related. In a partial ordering, if elements a and b are equal their order doesn't matter, like-wise if they are unrelated - you only have to order them with respect to elements with which they have defined relationships.

Here's an example that does a partial ordering of even and odd integers:

namespace PartialOrdering
{
    public static class Enumerable
    {
        public static IEnumerable<TSource> PartialOrderBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer)
        {
            List<TSource> list = new List<TSource>(source);
            while (list.Count > 0)
            {
                TSource minimum = default(TSource);
                TKey minimumKey = default(TKey);
                foreach (TSource s in list)
                {
                    TKey k = keySelector(s);
                    minimum = s;
                    minimumKey = k;
                    break;
                }
                foreach (TSource s in list)
                {
                    TKey k = keySelector(s);
                    if (comparer.Compare(k, minimumKey) < 0)
                    {
                        minimum = s;
                        minimumKey = k;
                    }
                }
                yield return minimum;
                list.Remove(minimum);
            }
            yield break;
        }

    }
    public class EvenOddPartialOrdering : IComparer<int>
    {
        public int Compare(int a, int b)
        {
            if (a % 2 != b % 2)
                return 0;
            else if (a < b)
                return -1;
            else if (a > b)
                return 1;
            else return 0; //equal
        }
    }
    class Program
    {
        static void Main(string[] args)
        {
            IEnumerable<Int32> integers = new List<int> { 8, 4, 5, 7, 10, 3 };
            integers = integers.PartialOrderBy<Int32, Int32>(new Func<Int32, Int32>(delegate(int i) { return i; }), new EvenOddPartialOrdering());
        }
    }
}

Result: 4, 8, 3, 5, 7, 10

Eric Mickelsen
I would agree, this is the most reasonable way to represent the partial order. Even if we had a way so see if elements are comparable or not it wouldn't be clear where to put something in relation to something its not related to. Equality seems to be the most straightforward way to do this
luke
Thanks for the answer. I haven't got time to examine it deeply yet. At first glance, I'm afraid the use of those `default` 's there might hide some bugs. For example, `default(int)` is zero, and it`s hardly the minimum int value. Have you tried negatives values? Anyway, I'll try it tomorrow morning.
jpbochi
Ok, the code works despite of the `default` 's. Any value put initially on the 'minimum' variables are overwritten in the first `foreach`. BTW, the first `foreach` can be discarded pretty easily. I'm testing some possible optimizations on your code. Works greatly anyways. :)
jpbochi
I posted a optimized version of your answer as a answer of my own for documentation purposes. Naturally, I'll accept yours.
jpbochi
The value of the defaults should be irrelevant to correctness because of the first foreach. It's just to reassure the compiler that those variables will be set. I'm sure there are myriad ways to improve the safety, elegance and performance of my code - it was just a quick throw-together to demonstrate the concept I originally posted. Good luck!
Eric Mickelsen
A: 

I believe that Lasse V. Karlsen's answer is on the right track, but I don't like hiding of the Compare method (or a separate interface for that matter which doesn't extend from IComparable<T>).

Instead, I'd rather see something like this:

public interface IPartialOrderComparer<T> : IComparer<T>
{
    int? InstancesAreComparable(T x, T y);
}

This way, you still have an implementation of IComparer<T> which can be used in other places that require IComparer<T>.

However, it also requires you to indicate the relationship of the instances of T to each other with the return value in the following way (similar to IComparable<T>):

  • null - instances are not comparable to each other.
  • 0 - the instances are comparable to each other.
  • > 0 - x is a comparable key, but y is not.
  • < 0 - y is a comparable key, but x is not.

Of course, you won't get partial ordering when passing an implementation of this to anything that expects an IComparable<T> (and it should be noted that Lasse V. Karlsen's answer doesn't solve this either) as what you need can't be represented in a simple Compare method which takes two instances of T and returns an int.

To finish the solution, you have to provide a custom OrderBy (as well as ThenBy, OrderByDescending and ThenByDescending as well) extension method which will accept the new instance parameter (as you have already indicated). The implementation would look something like this:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(      
    this IEnumerable<TSource> source,      
    Func<TSource, TKey> keySelector,      
    IPartialOrderComparer<TKey> comparer)
{
    return Enumerable.OrderBy(source, keySelector,
        new PartialOrderComparer(comparer);
}

internal class PartialOrderComparer<T> : IComparer<T>
{
    internal PartialOrderComparer(IPartialOrderComparer<T> 
        partialOrderComparer)
    {
        this.partialOrderComparer = partialOrderComparer;
    }

    private readonly IPartialOrderComparer<T> partialOrderComparer;

    public int Compare(T x, T y)
    {
        // See if the items are comparable.
        int? comparable = partialOrderComparable.
            InstancesAreComparable(x, y);

        // If they are not comparable (null), then return
        // 0, they are equal and it doesn't matter
        // what order they are returned in.
        // Remember that this only to determine the
        // values in relation to each other, so it's
        // ok to say they are equal.
        if (comparable == null) return 0;

        // If the value is 0, they are comparable, return
        // the result of that.
        if (comparable.Value == 0) return partialOrderComparer.Compare(x, y);

        // One or the other is uncomparable.
        // Return the negative of the value.
        // If comparable is negative, then y is comparable
        // and x is not.  Therefore, x should be greater than y (think
        // of it in terms of coming later in the list after
        // the ordered elements).
        return -comparable.Value;            
    }
}
casperOne
+1  A: 

Interface to define partial order relationship:

interface IPartialComparer<T> {
    int? Compare(T x, T y);
}

Compare should return -1 if x < y, 0 if x = y, 1 if y < x and null if x and y are not comparable.

Our goal is to return an ordering of the elements in the partial order that respects the enumeration. That is, we seek a sequence e_1, e_2, e_3, ..., e_n of the elements in the partial order such that if i <= j and e_i is comparable to e_j then e_i <= e_j. I'll do this using a depth-first search.

Class that implements topological sort using depth-first search:

class TopologicalSorter {
    class DepthFirstSearch<TElement, TKey> {
        readonly IEnumerable<TElement> _elements;
        readonly Func<TElement, TKey> _selector;
        readonly IPartialComparer<TKey> _comparer;
        HashSet<TElement> _visited;
        Dictionary<TElement, TKey> _keys;
        List<TElement> _sorted;

        public DepthFirstSearch(
            IEnumerable<TElement> elements,
            Func<TElement, TKey> selector,
            IPartialComparer<TKey> comparer
        ) {
            _elements = elements;
            _selector = selector;
            _comparer = comparer;
            var referenceComparer = new ReferenceEqualityComparer<TElement>();
            _visited = new HashSet<TElement>(referenceComparer);
            _keys = elements.ToDictionary(
                e => e,
                e => _selector(e), 
                referenceComparer
            );
            _sorted = new List<TElement>();
        }

        public IEnumerable<TElement> VisitAll() {
            foreach (var element in _elements) {
                Visit(element);
            }
            return _sorted;
        }

        void Visit(TElement element) {
            if (!_visited.Contains(element)) {
                _visited.Add(element);
                var predecessors = _elements.Where(
                    e => _comparer.Compare(_keys[e], _keys[element]) < 0
                );
                foreach (var e in predecessors) {
                    Visit(e);
                }
                _sorted.Add(element);
            }
        }
    }

    public IEnumerable<TElement> ToplogicalSort<TElement, TKey>(
        IEnumerable<TElement> elements,
        Func<TElement, TKey> selector, IPartialComparer<TKey> comparer
    ) {
        var search = new DepthFirstSearch<TElement, TKey>(
            elements,
            selector,
            comparer
        );
        return search.VisitAll();
    }
}

Helper class needed for marking nodes as visited while doing depth-first search:

class ReferenceEqualityComparer<T> : IEqualityComparer<T> {
    public bool Equals(T x, T y) {
        return Object.ReferenceEquals(x, y);
    }

    public int GetHashCode(T obj) {
        return obj.GetHashCode();
    }
}

I make no claim that this is the best implementation of the algorithm but I believe that it is a correct implementation. Further, I did not return an IOrderedEnumerable as you requested but that is easy to do once we are at this point.

The algorithm works by doing a depth-first search through the elements adding an element e to the linear ordering (represented by _sorted in the algorithm) if we have already added all the predecessors of e have already been added to the ordering. Hence, for each element e, if we haven't already visited it, visit its predecessors and then add e. Thus, this is the core of the algorithm:

public void Visit(TElement element) {
    // if we haven't already visited the element
    if (!_visited.Contains(element)) {
        // mark it as visited
        _visited.Add(element);
        var predecessors = _elements.Where(
            e => _comparer.Compare(_keys[e], _keys[element]) < 0
        );
        // visit its predecessors
        foreach (var e in predecessors) {
            Visit(e);
        }
        // add it to the ordering
        // at this point we are certain that
        // its predecessors are already in the ordering
        _sorted.Add(element);
    }
}

As an example, consider the partial-ordering defined on subsets of {1, 2, 3} where X < Y if X is a subset of Y. I implement this as follows:

public class SetComparer : IPartialComparer<HashSet<int>> {
    public int? Compare(HashSet<int> x, HashSet<int> y) {
        bool xSubsety = x.All(i => y.Contains(i));
        bool ySubsetx = y.All(i => x.Contains(i));
        if (xSubsety) {
            if (ySubsetx) {
                return 0;
            }
            return -1;
        }
        if (ySubsetx) {
            return 1;
        }
        return null;
    }
}

Then with sets defined as the list of subsets of {1, 2, 3}

List<HashSet<int>> sets = new List<HashSet<int>>() {
    new HashSet<int>(new List<int>() {}),
    new HashSet<int>(new List<int>() { 1, 2, 3 }),
    new HashSet<int>(new List<int>() { 2 }),
    new HashSet<int>(new List<int>() { 2, 3}),
    new HashSet<int>(new List<int>() { 3 }),
    new HashSet<int>(new List<int>() { 1, 3 }),
    new HashSet<int>(new List<int>() { 1, 2 }),
    new HashSet<int>(new List<int>() { 1 })
};
TopologicalSorter s = new TopologicalSorter();
var sorted = s.ToplogicalSort(sets, set => set, new SetComparer());

This results in the ordering:

{}, {2}, {3}, {2, 3}, {1}, {1, 3}, {1, 2}, {1, 2, 3}

which respects the partial order.

That was a lot of fun. Thanks.

Jason
Thanks for the answer. I'm glad it was fun to you. I'll try it through tomorrow. One detail I noticed is that you say you're going yo use a breadth-first search, but your code has a `DepthFirstSearch` class. BTW, testing the solution with sets was very neat.
jpbochi
Oops. Thanks for catching that. I used a depth first search.
Jason
This is a nice approach. There are some possible easy optimizations/simpifications. For s start, I tested your solution using the regular `IComparer` instead of your `IPartialComparer` and it worked correctly. Also, the `TopologicalSorter` class could be static. Anyway, the approach that **tehMick** followed seems simpler and quickes. I guess I'll have to accept his answer.
jpbochi
+3  A: 

This is my optimized and refurbished version of tehMick's answer.

One change I made was replacing the real list of values left to yield for a logical list. For this, I have two size arrays of the same size. One has all the values, and the others contains flags telling if each value has been yielded. This way, I avoid the cost of having to resize a real List<Key>.

One other change is that I'm reading all keys only one time at the start of the iteration. For reasons I can't recall now (maybe it' was just my instinct) I don't like the idea of calling the keySelector function several times.

The last touches was parameter validation, and an extra overload that uses an implicit key comparer. I hope the code is readable enough. Check it out.

public static IEnumerable<TSource> PartialOrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector)
{
    return PartialOrderBy(source, keySelector, null);
}

public static IEnumerable<TSource> PartialOrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IComparer<TKey> comparer)
{
    if (source == null) throw new ArgumentNullException("source");
    if (keySelector == null) throw new ArgumentNullException("keySelector");
    if (comparer == null) comparer = (IComparer<TKey>)Comparer<TKey>.Default;

    return PartialOrderByIterator(source, keySelector, comparer);
}

private static IEnumerable<TSource> PartialOrderByIterator<TSource, TKey>(
    IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IComparer<TKey> comparer)
{
    var values = source.ToArray();
    var keys = values.Select(keySelector).ToArray();
    int count = values.Length;
    var notYieldedIndexes = System.Linq.Enumerable.Range(0, count).ToArray();
    int valuesToGo = count;

    while (valuesToGo > 0)
    {
        //Start with first value not yielded yet
        int minIndex = notYieldedIndexes.First( i => i >= 0);

        //Find minimum value amongst the values not yielded yet
        for (int i=0; i<count; i++)
        if (notYieldedIndexes[i] >= 0)
        if (comparer.Compare(keys[i], keys[minIndex]) < 0) {
            minIndex = i;
        }

        //Yield minimum value and mark it as yielded
        yield return values[minIndex];
        notYieldedIndexes[minIndex] = -1;
        valuesToGo--;
    }
}
jpbochi