tags:

views:

234

answers:

3

I have multiple ordered lists. Unfortunately, the order of the items isn't a simple alpha or numeric comparison, otherwise this is trivial. So what I have is something like:

List #1        List #2       List #3
groundhog      groundhog     easter
mothersday     mayday        mothersday
midsummer      laborday      halloween
christmas

And from this I can gather than groundhog < mothersday, but the relationship of groundhog and easter is unknown. I am guaranteed that the order of the items from list-to-list is self consistent. (i.e. that no matter which list it occurs in, easter is always before halloween)

But what I need is a new ordered list that represents each item in the other lists only once, that preserves all of the known relationships above:

groundhog
easter
mayday
mothersday
midsummer
laborday
halloween
christmas

However, the following list is also perfectly valid:

easter
groundhog
mothersday
mayday
midsummer
laborday
halloween
christmas

I'm looking for a fairly quick, general-purpose algorithm I can use to order N lists in this way. (Working C# code a plus, for sure, but not necessary.)

I have solution that works, but its O(N^2) and a dog with even modest data sets.

+5  A: 

You may want to have a look at topological sorting. I think it applies quite well to your case.

bdumitriu
A: 

I would use the Array.Sort method, with a Comparison method that takes the two strings to be compared, and then checks their presence in any list; any list which has them both, find their relative positions, and return based upon that; if no list has them both, return equality.

The MSN documentation states that their sorting algorthm uses a quicksort; averaging nlog(n) order, with worst case being on the order of n^2.

This way, you get to leverage their implementation of the sorting algorithm; all you have to do is implement the comparison code.

McWafflestix
+1  A: 

I agree with @bdumitriu, you want topological sorting.

This type of sort assumes you have a partial order among your data items, which means that for certain pairs of items, you can compare them to see which one precedes the other. In this case, like you say, there are multiple ways to create a single list of the items that preserves all of the constraints.

Topological sort usually works by first creating a directed acyclic graph of your items, where each item becomes a vertex, and a directed edge from node X to node Y means item X precedes item Y in your input lists. (So you'd walk through your set of input sorted lists, and every time you encounter a new item, you'd make a vertex for it, and for every consecutive pair of items in each sorted list, you make a directed edge from the first item to the second. Note that you don't need to create directed edges from an item to all of the previous items in each input list; for example in your input List 1, you'd create edges groundhog -> mothersday, mothersday -> midsummer, and midsummer -> christmas.)

A topological sort will take time O(V+E), where V is the total number of items you are sorting (the number of vertices), and E is the total number of predecessor relationships from your input lists (the number of edges).

--Phil

serenader