views:

153

answers:

5

Say, I employ merge sort to sort an array of Integers. Now I need to also remember the positions that elements had in the unsorted array, initially. What would be the best way to do this?

A very very naive and space consuming way to do would be to (in C), to maintain each number as a "structure" with another number storing its index:

struct integer {
int value;
int orig_pos;
};

But, obviously there are better ways. Please share your thoughts and solution if you have already tackled such problems. Let me know if you would need more context. Thank you.

+1  A: 

Is the struct such a bad idea? The alternative, to me, would be an array of pointers.

Jweede
+2  A: 

Clearly for an N-long array you do need to store SOMEwhere N integers -- the original position of each item, for example; any other way to encode "1 out of N!" possibilities (i.e., what permutation has in fact occurred) will also take at least O(N) space (since, by Stirling's approximation, log(N!) is about N log(N)...).

So, I don't see why you consider it "space consuming" to store those indices most simply and directly. Of course there are other possibilities (taking similar space): for example, you might make a separate auxiliary array of the N indices and sort THAT auxiliary array (based on the value at that index) leaving the original one alone. This means an extra level of indirectness for accessing the data in sorted order, but can save you a lot of data movement if you're sorting an array of large structures, so there's a performance tradeoff... but the space consumption is basically the same!-)

Alex Martelli
Thanks for pointing out the space consumption thing. May be, its just the "struct" keyword! How dumb of me. Thanks for the other ideas. I can get going!
Amit
To be more precise, instead of saying that it would take at least O(N) space, I'd say it takes O(N log(N)) space.
Adam Crume
@Adam, for N tending to infinity, yes, you'd need O(log(N)) space for each index into an N-long array... that's where the log(N) factor comes from;-). But in practice, given that memory is finite, each index is a fixed size (say 32 or 64 bits, but in any case finite, fixed, and smallish), so that factor is best ignored (for actual program design purposes).
Alex Martelli
A: 
  • you can do it the way you proposed

  • you can also remain a copy of the original unsorted array (means you may use a not inplace sorting algorithm)

  • you can create an additional array containing only the original indices

All three ways are equally space consuming, there is no "better" way. you may use short instead of int to safe space if you array wont get >65k elements (but be aware of structure padding with your suggestion).

codymanix
+1  A: 

It feels to me that in this question you have to consider the age old question: speed vs size. In either case, you are keeping both a new representation of your data (the sorted array) and an old representation of your data (the way the array use to look), so inherently your solution will have some data replication. If you are sorting n numbers, and you need to remember after they were sorted where those n numbers were, you will have to store n amount of information somewhere, there is no getting around that.

As long as you accept that you are doubling the amount of space you need to be able to keep this old data, then you should consider the specific application and decide what will be faster. One option is to just make a copy of the array before you sort it, however resolving which was where later might turn into a O(N) problem. From that point of view your suggestion of adding another int to your struct doesn't seem like such a bad idea, if it fits with the way you will be using the data later.

hevans66
+1  A: 

This looks like the case where I use an index sort. The following C# example shows how to do it with a lambda expression. I am new at using lambdas, but they can do some complex tasks very easily.

// first, some data to work with
List<double> anylist = new List<double>;
anylist.Add(2.18);  // add a value
...                 // add many more values
// index sort
IEnumerable<int> serial = Enumerable.Range(0, anylist.Count);
int[] index = serial.OrderBy(item => (anylist[item])).ToArray();
// how to use
double FirstValue = anylist[index[0]];
double SecondValue = anylist[index[1]];

And, of course, anylist is still in the origial order.

Mark T