views:

311

answers:

4

Given N arbitrary integers, how to find average of top half of these numbers? Is there an O(n) solution? If not is it possible to prove that it's not possible?

A: 

I would suggest this:

Use Quicksort, select some pivot. This will partition your list into two sublist, one smaller than the pivot, one greater than that. If the size of smaller sublist is <= N/2, calculate the average say a1.
If size == N/2 or size == N/2 -1
you are done immediately.

If not repartition the greater sublist till the total size is N/2.

If size > N/2 partition the smaller sublist.

Repeat all till done.

P.S: you don't need to sort.

Neeraj
It's `O(n^2)` in the worst case...
Pavel Shved
+1  A: 

You could use a priority queue. Insert the elements into the queue maintaining a count of how many elements you've seen, n . Extract n/2 maximum elements from the queue into an accumulator and calculate the average.

With a well chosen data structure behind the queue, such as a fibonacci heap, this will give you O(n log n) runtime, as insertion is O(1) and extraction is O(log n).

Unfortunately not the O(n) runtime you were looking for, but with the data structure already implemented, this would produce very understandable straightforward code.

Greg Sexton
*Finding* the maximum is O(1) in a Fibonacci heap, but *removing* it (thus allowing what was the second-to-max to be found in another O(1)) is O(log n). If "insert" and "remove max" really were both O(1) in a Fibonacci heap, then it would be possible to use one to perform a comparison sort in O(n).
Steve Jessop
You are quite right, apologies, I've edited my answer accordingly. That pesky nlogn lower bound on sorting!
Greg Sexton
+1  A: 

This is obviously solvable in linear time, if you can find the median in linear time. And finding a median in linear time is tricky, but possible. See for example the wikipedia article on selection algorithms.

Accipitridae
+11  A: 

First, find a median of the given array (it takes linear time).

Then, just walk through array and sum up all elements that are greater than the median.

Count how many elements you summed (M). If M < N/2, then it means that several elements that are equal to the median value (namely, N/2 - M) belong to the top half. Add to your sum that many median values. We need this complexity because we don't know how many median elements (there can be several) belong to the top half: if we take them all, we can end up having summed more than N/2 elements.

Now you have the sum of the top half of the array. Divide by N/2, and you're done.

Pavel Shved
Or the code might be simpler if you do an extra O(n) pass and just count the number of elements equal to the median. That tells you how many equal-to-the-median elements to include in the average.
Steve Jessop
Even simpler would be to use that almost any algorithm for finding a median also finds a partition of the input list into a upper an lower half. Hence once the median is found all the elements in the upper half are already known.
Accipitridae