views:

441

answers:

8

I came up with 3 algorithms to attempt the above problem, but I am not sure which one is the best in terms of running time.

I'm listing my ideas for the algorithms below and could really use some help to figure out the most efficient algorithm for this.

Algo-1

I used selection sort and stopped it once the numbers are sorted.

Algo-2

I built a max-heap and kept removing the largest.

A: 

If you know N, just create an array with a length of 1/10th of that. initial value for each cell is Int.MinValue. Examine each number in the array. If its larger than the smallest number in the ten percent array, add it.

Avoids a sort but at the expense ofconstant scans of the answer array. You can offset this somewhat by keeping it in sorted order so yo ucanuse a binary search.

No Refunds No Returns
anonymous down-votes. The thing I love MOST about SO.
No Refunds No Returns
looks O(N^2) to me
GregS
"larger than the smallest number in the 10% array" is going to be a problem, if the initial array is pathological then you will rapidly have to constantly search the 10% for next smallest on every read
DaveC
Yep, O(n^2) if he uses the mini-array. Algo-2 in the original post with a heap is better than this.
polygenelubricants
I guess if sorting is free then this is indeed bad. oh wait ... sorting ISN'T free.
No Refunds No Returns
+4  A: 

Algo-1: Selection sort will run in O(n^2). First scan you do (n-1) compares, the second time (n-2), the n/10 time (n-n/10), so (n-1) + (n-2) + ... + (n-n/10) => O(n^2)

Algo-2: Deleting the max element from a heap is O(log n), so this one will run O(n log n) since you want to delete n/10 elements.

Another possible algorithm, though still O(n log n), but I think might be better than Algo-2 is to use the following quick-sort inspired procedure.

  1. Pick a pivot point
  2. Scan the all elements and put them in one of 2 buckets: those less than the pivot (left bucket) and those greater than the pivot (right bucket) (n-1) comparisons. Follow the quick sort procedure of in-place swapping.
  3. a. Size of bucket on right == n/10: You are done.

    b. Size of bucket on right > n/10 then the new list is the bucket on the right, recursively go to step 1 with the new list.

    c. Size of bucket on right < n/10 then new list is bucket on the left, but you only want to find the biggest n-n/10-(size of right bucket). Recursively go to step 1 with the new list.

DasBoot
+5  A: 

The quickest solution is to use the partition-based selection algorithm, which runs in O(n). It's based of the quicksort idea, except instead of sorting both partitions recursively, you only go to one of the partitions to find the k-th smallest element.

Finding the largest 10% is accomplished by searching for the k=(90%*N)-th smallest number.

If you recall how partitioning in quicksort works, elements less than the pivot is moved to the left, and the rest of the elements go to the right. Let's say you want to select the k-th smallest element. Then you see if there are at least k elements to the left of the pivot. If there are, then you know you can ignore the elements in the right partition. Otherwise, you can ignore all elements in the left partition, because you know that element will be in the right partition.

Note that the selection algorithm only identifies what those top 10% numbers are. If you need them to be sorted, then you have to sort those numbers (but only those numbers, the other 90% can be ignored).

polygenelubricants
A good solution, but not O(n). The question calls for the top 10% "in order", so you need to do more than just select an element. The complete solution to the question based on this would be O(n log n).
Steve Jessop
You are right. I've added some remarks to that effect.
polygenelubricants
It is O(n) to find the number that is at the 90th percentile using quicksearch and O(n/10 log(n/10)) to sort the 10% of the initial elements that are in that partition. This is much better than O(n log n) to just quicksort the whole array and take the top n/10 items. One other thing to note is that you have to be careful if you have repeated elements in your array.
Justin Peel
+2  A: 

I would use quicksort descending on the array and get the first N/10 items.

Biroka
Thanks! good to know, this was the first and easiest way that came into my mind.
Biroka
Huh, did you just thank yourself?
Peter Alexander
I smell a cheater...
polygenelubricants
He probably forgot to log back to his noviceneedshelp user.
Lasse V. Karlsen
-1: Multiaccounting is LAME.
Platinum Azure
There was a comment before mine, telling me that there is a better solution, but it appears that it's already deleted.
Biroka
+2  A: 

Construct a heap with O(lnN) replacement cost filled with the first n/10 elements. Scan the remaining numbers comparing with the least value in the heap. If the current element's value is higher that the least element in the heap, insert it into the heap and remove the least element. At worst, two O(lnN) operations times N scanned items gives O(N ln N), which isn't any better in time than a sort, but requires less storage than sorting everything, as so in practice is likely to be faster (especially if N elements do not fit in cache but n/10 will - asymptotic time only matters one you are in a flat space).

Pete Kirkham
A: 

The most efficient algorithm would be to use a modified quicksort.

Quicksort starts by choosing a "middle" value and putting all values lower than this to the left, and all greater to the right. Normally you would go down and recursively sort both sides, but you only need to sort the right side if there are less than 10% of the elements on the left.

If there are more than 10%, then you only need to sort the left side, and probably only some of the left side.

This won't reduce the complexity below the optimal O(N lg N), but it will reduce the constant factor and make it faster than the obvious "quicksort then choose the first 10" approach.

Peter Alexander
A: 

Very goofy question, just sort it with any sort-algorithm, and take the first N/10 items.

Algo-2 is equivalent to doing this with heap-sort

BlueRaja - Danny Pflughoeft
A: 

because this is homework, my answer would be any sorting algorithm, this is because you cannot solve this under O(n*log(n)).

if this was possible then you could fully sort an array under O(n*log(n)). (by finding the sorted top 10% in the array you want to fully sort, removing them and repeating this process 10 times).

because sorting isn't possible under O(n*log(n)), so is this problem.

Shuky Kappon