views:

167

answers:

5

How does one find the rank of each element of an array, averaging in the case of ties, efficiently? For example:

float[] rank(T)(T[] input) {
    // Implementation
}

auto foo = rank([3,6,4,2,2]);  // foo == [3, 5, 4, 1.5, 1.5]

The only way I can think of doing it requires allocating 3 arrays:

  1. A duplicate of the input array because it has to be sorted and we don't own it.
  2. An array to keep track of the order in which the input array was sorted.
  3. An array of ranks to return.

Does anyone know how to do this in O(N log N) time and O(1) auxiliary space (meaning the only array we have to allocate is the one we're going to return), or at least get rid of one of the three arrays above?

A: 

If you don't own the array I don't think it's possible to do it in O(N log N) and in space O(1).

If the range of elements (how large an element can be) is small, use counting. Count how many are there of each element and then calculate the result array based on the input array using the counting array.

c - is counting result,
C - is cumulative counting
C[i] = c[i] + c[i-1] + c[i-2] + ... + c[0]
result[i] = 1 / c[in[i]] + C[in[i]-1]
egon
A: 

Why not just copy and sort the array and go from there? There are plenty of in-place sort algorithms available such as heapsort.

rlbond
A: 

Ok, so you duplicate your input array into foo. Sort foo in-place in O(n log n) time with heapsort. Now, take the first element of your input array and find its rank in foo in O(log n) time using binary search and insert the rank into the ranks array and return it.

Now, you use 2 arrays instead of 3.

Jacob
+2  A: 

You can allocate the array you are going to return (let's call it R), initialize it to 0..n-1 and then "sort" the incoming array (called I) but using the comparison I[R[k]] vs. I[R[j]] instead of the normal R[k] vs. R[j] and then swapping the values in the R array as needed (instead of the values in the I array as usual).

You can implement this indirect sorting using either quicksort or heapsort (or bubblesort but that will mess up your complexity).

You only need to allocate one array - and some stack space for the indices.

florin
So obvious in hindsight. Why didn't I think of this?
dsimcha
Actually, on second thought, this doesn't work. I initially misunderstood it.
dsimcha
Yes it does - you need to compare indirectly and then update the indirecting array as well. I have added a clarification to my post.
florin
It works pretty well, apart for the 'average' requirement. You may need a second pass for it at the end.
Matthieu M.
Matthieu: can you explain <<the 'average' requirement>>?
florin
Exactly the answer I was going to give. Here's an example implementation in Python: ranks = sort(range(len(l)), key=lambda x:l[x])
Nick Johnson
A: 

Perhaps it would be useful to summarize florin's answer (and the associated comments) with some simple code.

Here's how to do it in Ruby:

arr = [5,1,0,3,2,4]
ranks = (0..arr.length-1).to_a.sort_by{ |x| arr[x] }
# ranks => [2, 1, 4, 3, 5, 0]

And in Python:

arr = [5,1,0,3,2,4]
ranks = range(len(arr))
ranks.sort(key=lambda x:arr[x])
# ranks => [2, 1, 4, 3, 5, 0]

The ranks array tells you that 0 has rank 2, 1 has rank 1, 2 has rank 4, etc. (Of course, those ranks start at zero, not one.)

Nate Kohl
This method doesn't work for negative elements in the array: try with arr = [5,1,0,3,-2,4] -- you get [4, 2, 1, 3, 5, 0]
florin