views:

1172

answers:

4

How do you find the median of a list of numbers stored as a LinkedList in Java? I don't understand the Selection Algorithm that Wikipedia refers to. Bonus points if you can explain that.

+4  A: 

The best algorithm is QuickSelect

Read the article and try to understand QuickSort - The concept is similar.

Dario
But will this algorithm still be the fastest on a *linked* list? It seems like for small N, a naive approach might be better.
Paul Fisher
Yes it will. QuickSelect is perfectly applicable to linked lists (even in purely functional languages)!
Dario
+1  A: 

If you want a quick-to-program way, sort the list and choose the middle element (or mean of the two middle elements).

More efficient selection algorithms essentially try to avoid sorting bits of the list that you don't actually need to sort to find the nth element for your given n. (The JDK doesn't currently provide such an algorithm out of the box.) Update: just to expand on my earlier post, examples of more efficient methods would include:

  • basing selection on a parallel sort whose initial phase puts the data into a number of 'buckets' before sorting each of the buckets, but then (in the case of the median) where you only actually sort the middle 'bucket', since the precise order of items in buckets either side is irrelevant
  • basing selection on a divide-and-conquer sort algorithm such as Quicksort, but again where the algorithm doesn't actually sort sublists that are outside the required range.
Neil Coffey
A: 

The median is the number in the middle of the sorted list. (Easy for a list with an uneven number of entries, for a list with an even number of entries it can be any number between the two "middle numbers" both included)

So sort the list, and figure out what the middle number(s) are.

Thorbjørn Ravn Andersen
Sorting is much more complex (O(nlogn)) than selecting the median (O(n))
Dario
Would the QuickSelect you advocate for, not run in O(log n)? If not, how come you say O(n)?
Thorbjørn Ravn Andersen
+1  A: 

You have basically two major options:

  1. Easy and quick method - sort the list in O(nlogn). Since the median is defined as the value which half the values are larger than it, and half are smaller, just take the n/2th value - that's the median.

  2. If this is a performance crucial task, you can apply various selection algorithms which can give you O(n) in good cases (but still cannot guarantee linearity in worst-cases).

In any case, check out the wikipedia entry on Selection Algorithms, that should give you a good round-up on all the available methods.

Yuval A