ansaurus

Question

How to find the top several values from an array?

Answer 1

+7 A:

I can suggest an alternative algorithm which you'll have to code :)

Use a heap of size K where K denotes the count of top elements you want to save. Initialize this to the first K elements of your original array. For all N - K elements walk the array, inserting as and when required.

proc top_k (array<n>, heap<k>)
heap <- array<1..k-1>
for each (array<k..n-1>) 
  if array[i] > heap.min
     heap.erase(heap.min)
     heap.insert(array[i])
  end if
end for

dirkgently 2009-03-06 01:30:18

*Facepalm* Heaps! Thank you!

Karl 2009-03-06 01:39:47

+1. Very nice solution requiring only O(nlog k) time and O(k) space.

j_random_hacker 2009-03-06 01:41:44

Answer 2

+2 A:

You could still use your list idea - the elements you put in the list could be a structure which stores both the index and the value; but sorts only on the value, for instance:

class IndexAndValue : IComparable<IndexAndValue>
{
    public int index;
    public double value;

    public int CompareTo(IndexAndValue other)
    {
        return value.CompareTo(other.value);
    }
}

Then you can stick them in the list, while retaining the information about the index. If you keep only the largest m items in the list, then your efficiency should be O(mn).

Smashery 2009-03-06 01:33:24

I think you meant "return value.CompareTo(other.value);"

Brannon 2009-03-06 01:40:04

Answer 3

+2 A:

I don't know which algorithm you're currently using, but I'll suggest a simple one. Admitting that you have an array of floats f and a maximum of capacity numbers, you could do the following:

int capacity = 4; // number of floats you want to retrieve
float [] f; // your float list
float [] max_so_far = new float[capacity]; // max so far

// say that the first 'capacity' elements are the biggest, for now
for (int i = 0; i < capacity; i++)
  max_so_far[i] = i;

// for each number not processed
for (int i = capacity; i < f.length; i++)
{
  // find out the smallest 'max so far' number
  int m = 0;
  for (int j = 0; j < capacity; j++)
    if (f[max_so_far[j]] < f[max_so_far[m]])
      m = j;

  // if our current number is bigger than the smallest stored, replace it
  if (f[i] > f[max_so_far[m]])
    max_so_far[m] = i;
}

By the end of the algorithm, you'll have the indices of the greatest elements stored in max_so_far.

Do note that if the capacity value grows, it will become slightly slower than the alternative, which is sorting the list while keeping track of the initial positions. Remember that sorting takes O(n*log n) comparisons, while this algorithm takes O(n*capacity).

Hugo Peixoto 2009-03-06 02:04:38

Answer 4

+1 A:

Another option is to use quick-select. Quick-select returns the position of the k-th element in a list. After you have the position and the value of the k-th element, go over the list and take every element whose value is smaller/larger than the k-th element.

I found a c# implementation of quick-select here: link text

Pros:

O(n+k) average running time.

Cons:

The k elements found are not sorted. If you sort them the running time is O(n + logk)
I haven't checked this, but I think that for a very small k the best option is to do k runs over the array, each time finding the next smallest/largest element.

ytoledano 2010-01-07 11:16:26

ansaurus

tags:

views:

answers:

How to find the top several values from an array?

related questions