views:

334

answers:

2

My objective is: Given a list of entries, and a desired ordering, rearrange the list of entries according to this ordering. The list will be very large, so space efficiency is important.

Ex:

List<Entry> data = ReadDataFromSomeWhere(); // data => [a, b, c];
List<int> ordering = RandomPermutation(data.Count); // ordering => [2, 1, 3];
data.ReOrderBy(ordering);  // data => [b, a, c];

I may be mistaken, but it seems like the most straightforward and space efficient solution is to sort/orderby the data by the ordering. or more generally:

Given two lists: A,B is there a way to sort A by B? The functionality would be essentially the same as: Array.Sort<(Of <(TKey, TValue>)>)(array<TKey>[]()[], array<TValue>[]()[])

One methodology that comes to mind is to create a new datatype which is composed from A and B, ie. Pair, then sort by the B values:

List<T> A;
List<T> B;
Assert(A.Count == B.Count);
var C = A.Select( (a,idx) => new Pair<T,T>(B[idx],a)).OrderBy(c => c.First);
A = C.Select(x => x.Second).ToList();

However, I would like this to be as space efficient as possible (both select's and the tolist() call I'm guessing are expensive), so a largely in-place sort is necessary. To this end, is there a way to write a comparer for A.Sort(), which references B?

+1  A: 

The most efficient reordering algorithm would probably be some form of lazy evaluation. Rather than actually recomputing the sequence in the new order, you could create an IList implementation that uses the data list and the ordering list to dynamically return values from a "virtually reordered" sequence.

The benefit of the lazy model is that performing a double lookup is negligible in terms of performance, and does not require that the entire list be reordered in order to begin returning values. It can also potentially allow the same value list to be presented in multiple ordered without duplicating the underlying list. You could, of course,change the implementation to copy the lists if you want to ensure that changes to the original list (or ordering) don't affect the dynamically ordered list.

Here's a code example of this (not tested).

NOTE: I've changed the implementation slightly from your example to use 0-based indexes (instead of 1-based) for simplicity. But it would be easy to convert the code to use 1-based indexes if necessary.

public DynamicallyOrderedList<T> : IList<T>
{
   private readonly IList<T> m_Values;
   private readonly IList<int> m_Order;

   public DynamicallyOrderedList( IList<T> valueList, IList<int> ordering )
   {
       if( valueList == null || ordering == null ) 
           throw new ArgumentNullException();
       if( valueList.Count != ordering.Count )
           throw new InvalidArgumentException("Lists are not of same size.");
       // assumes ordering list has distinct values ranging from 0 to Count-1

       m_Values = valueList;
       m_Order  = ordering;           
   }

   // IList<T> Implementation

   // for simplicity, don't allow addition, removal or clearing of items
   // these could, however be implemented to add items to the end of the list 
   // and remove them by collapsing the ordering list. 
   // Left as an exercise for the reader :-)
   public void Add( T item ) { throw new NotSupportedException(); }

   public void Insert(int index, T item) { throw new NotSupportedException(); }

   public void Clear() { throw new NotSupportedException(); }

   public void Remove( T item ) { throw new NotSupportedException(); }

   public void RemoveAt( int index ) { throw new NotSupportedException(); }

   public T this[int index]
   {
       get 
       {
           if( index > m_Values.Count ) 
               throw new ArgumentOutOfRangeException("index");
           return m_Values[m_Order[index]];
       }
       set
       {
           if( index > m_Values.Count ) 
               throw new ArgumentOutOfRangeException("index");
           m_Values[m_Order[index]] = value;
       }
    }

   public int Count { get { return m_Values.Count; } }

   public bool Contains( T item ) { return m_Values.Contains( item ); }

   public bool IndexOf( T item ) { return m_Order[m_Values.IndexOf( item )]; }

   // Enumerator that returns items in the order defined by m_Order
   public IEnumerator<T> GetEnumerator()
   {
       // use generator syntax to simplify enumerator implementation
       foreach( var index in m_Order )
           yield return m_Values[index];
   }

   public void CopyTo( T[] array, int arrayIndex )
   {
        foreach( var item in this )
            array[arrayIndex++] = item;
   }

}

Your ReorderBy() function then becomes:

public static class ReorderByExt
{
    public static IList<T> ReorderBy<T>( this IList<T> list, IList<int> order )
    {
        return new DynamicallyOrderedList( list, order );
    }
}

If you then needed to materialize the dynamically ordered list into a regular list, you could use the following (which will run in O(n) time and will not create unnecessary copies of the data):

var staticallyOrderedList = originalList.ReorderBy( ordering ).ToList();
LBushkin
No. `OrderBy` is going to create a temporary to store the sort as soon as the `IOrderedEnumerable` is iterated over the first time.
Jason
@Jason: I had a typo, I meant to use `ReorderBy( ordering )` rather than `OrderBy( ordering )` in the last code snippet, which wouldn't even compile.
LBushkin
@Jason: BTW, I was not the one who downvoted your answer.
LBushkin
@LBushkin: As I side note. I like the two posts on your blog. I'm also very much a puzzle guy. You should seriously consider posting more. :-)
Jason
@LBushkin: Ah, I see. Very, very nice. But, it won't let me upvote your answer?
Jason
@Jason: Don't know why SO won't let you upvote. As for my blog, more is coming soon, I've been busy at work lately so something had to give.
LBushkin
@LBushkin: It just says "Vote is too old to be changed, unless this answer is edited." But I don't have a vote on your post. Re work: I know the feeling.
Jason
+1: I came up with something very similar :) Since the OP is already given an array of destination indexes, there is no need to "sort" at all, you only need a projection of the array. Neat thing about this code is how it sorts in O(1) and uses O(1) memory.
Juliet
+2  A: 

There are two ways to interpret the order array, one in which it lists the source index for each element, and one in which it lists the destination index for each element. I'm not sure which one you mean.

Reordering a list given a list of destinations is pretty easy:

// Sets list[destination[i]] = list[i] for all i.  Clobbers destination list.
void ReorderByDestination(List<T> list, List<int> destination) {
  for (int i = 0; i < list.Count; i++) {
    while (destination[i] != i) {
      int d = destination[i];
      T t = list[d];            // save element in destination slot
      int t_d = destination[d]; // and its own destination
      list[d] = list[i];        // move element to destination
      destination[d] = d;       // and mark it as moved
      list[i] = t;              // store saved element in slot i
      destination[i] = t_d;     // ... and its destination
    }
  }
}

Reordering a list given a list of sources (which is what I think you're intending) is a bit harder, you just need to invert the permutation first.

// Sets list[i] = list[source[i]] for all i.  Clobbers source list.
void ReorderBySource(List<T> list, List<int> source) {
  InvertPermutation(source);
  ReorderByDestination(list, source);
}

There are known in-place permutation inversion routines, the first one I found was perm_inv in SUBSET.

Often, you don't need to invert the permutation, instead modify whatever generated the source list to generate a destination list instead.

Keith Randall
excellent. thanks!
homie347