views:

1159

answers:

4

I have an array of float values and want the value and more importantly the position of the maximum four values.

I built the system originally to walk through the array and find the max the usual way, by comparing the value at the current position to a recorded max-so-far, and updating a position variable when the max-so-far changes. This worked well, an O(n) algo that was very simple. I later learned that I need to keep not only the top value, but the top three or four. I extended the same procedure and complicated the max-so-far into an array of four max-so-fars and now the code is ugly.

It still works and is still sufficiently fast because only a trivial amount of computations have been added to the procedure. it still effectively walks across the array and checks each value once.

I do this in MATLAB with a sort function that returns two arrays, the sorted list and the accompanying original position list. By looking at the first few values I have exactly what I need. I am replicating this functionality into a C# .NET 2.0 program.

I know that I could do something similar with a List object, and that the List object has a built in sort routine, but I do not believe that it can tell me the original positions, and those are really what I am after.

It has been working well, but now I find myself wanting the fifth max value and see that rewriting the max-so-far checker that is currently an ugly mess of if statements would only compound the ugliness. It would work fine and be no slower to add a fifth level, but I want to ask the SO community if there is a better way.

Sorting the entire list takes many more computations than my current method, but I don't think it would be a problem, as the list is 'only' one or two thousand floats; so if there is a sort routine that can give back the original positions, that would be ideal.

As background, this array is the result of a Fourier Transform on a kilobyte of wave file, so the max values' positions correspond to the sample data's peak frequencies. I had been content with the top four, but see a need to really gather the top five or six for more accurate sample classification.

+7  A: 

I can suggest an alternative algorithm which you'll have to code :)

Use a heap of size K where K denotes the count of top elements you want to save. Initialize this to the first K elements of your original array. For all N - K elements walk the array, inserting as and when required.

proc top_k (array<n>, heap<k>)
heap <- array<1..k-1>
for each (array<k..n-1>) 
  if array[i] > heap.min
     heap.erase(heap.min)
     heap.insert(array[i])
  end if
end for
dirkgently
*Facepalm* Heaps! Thank you!
Karl
+1. Very nice solution requiring only O(nlog k) time and O(k) space.
j_random_hacker
+2  A: 

You could still use your list idea - the elements you put in the list could be a structure which stores both the index and the value; but sorts only on the value, for instance:

class IndexAndValue : IComparable<IndexAndValue>
{
    public int index;
    public double value;

    public int CompareTo(IndexAndValue other)
    {
        return value.CompareTo(other.value);
    }
}

Then you can stick them in the list, while retaining the information about the index. If you keep only the largest m items in the list, then your efficiency should be O(mn).

Smashery
I think you meant "return value.CompareTo(other.value);"
Brannon
+2  A: 

I don't know which algorithm you're currently using, but I'll suggest a simple one. Admitting that you have an array of floats f and a maximum of capacity numbers, you could do the following:

int capacity = 4; // number of floats you want to retrieve
float [] f; // your float list
float [] max_so_far = new float[capacity]; // max so far

// say that the first 'capacity' elements are the biggest, for now
for (int i = 0; i < capacity; i++)
  max_so_far[i] = i;

// for each number not processed
for (int i = capacity; i < f.length; i++)
{
  // find out the smallest 'max so far' number
  int m = 0;
  for (int j = 0; j < capacity; j++)
    if (f[max_so_far[j]] < f[max_so_far[m]])
      m = j;

  // if our current number is bigger than the smallest stored, replace it
  if (f[i] > f[max_so_far[m]])
    max_so_far[m] = i;
}

By the end of the algorithm, you'll have the indices of the greatest elements stored in max_so_far.

Do note that if the capacity value grows, it will become slightly slower than the alternative, which is sorting the list while keeping track of the initial positions. Remember that sorting takes O(n*log n) comparisons, while this algorithm takes O(n*capacity).

Hugo Peixoto
+1  A: 

Another option is to use quick-select. Quick-select returns the position of the k-th element in a list. After you have the position and the value of the k-th element, go over the list and take every element whose value is smaller/larger than the k-th element.

I found a c# implementation of quick-select here: link text

Pros:

  1. O(n+k) average running time.

Cons:

  1. The k elements found are not sorted. If you sort them the running time is O(n + logk)
  2. I haven't checked this, but I think that for a very small k the best option is to do k runs over the array, each time finding the next smallest/largest element.
ytoledano