views:

523

answers:

3

I have written a method to calculate a given percentile for a set of numbers for use in an application I am building. Typically the user needs to know the 25th percentile of a given set of numbers and the 75th percentile.

My method is as follows:

def calculate_percentile(array,percentile)
 #get number of items in array
 return nil if array.empty?

 #sort the array
 array.sort!

 #get the array length
 arr_length = array.length

 #multiply items in the array by the required percentile (e.g. 0.75 for 75th percentile)
 #round the result up to the next whole number
 #then subtract one to get the array item we need to return
 arr_item = ((array.length * percentile).ceil)-1

 #return the matching number from the array
 return array[arr_item]

end

This looks to provide the results I was expecting but can anybody refactor this or offer an improved method to return specific percentiles for a set of numbers?

+6  A: 

Some remarks:

  • If a particular index of an Array does not exist, [] will return nil, so your initial check for an empty Array is unnecessary.
  • You should not sort! the Array argument, because you are affecting the order of the items in the Array in the code that called your method. Use sort (without !) instead.
  • You don't actually use arr_length after assignment.
  • A return statement on the last line is unnecessary in Ruby.
  • There is no standard definition for the percentile function (there can be a lot of subtleties with rounding), so I'll just assume that how you implemented it is how you want it to behave. Therefore I can't really comment on the logic.

That said, the function that you wrote can be written much more tersely while still being readable.

def calculate_percentile(array, percentile)
  array.sort[(percentile * array.length).ceil - 1]
end
molf
A: 

If you're calculating both quartiles, you might want to move the "sort" outside the function, so that it only needs to be done once. This also means you aren't modifying your caller's data (sort!), nor making a copy every time the function is called (sort).

I know, premature optimisation and all that. And it's a bit awkward for the function to say, "the array must be sorted before calling this function". So it's reasonable to leave it as it is.

But sorting already-sorted data is going to take considerably longer than the whole rest of the function put together(*). It also has higher algorithmic complexity: O(N) at best, when the function could be O(1) for the second quartile (although O(N log N) for the first one if the data is not already sorted, of course). So it's worth avoiding if performance might ever be an issue for this function.

There are slightly faster ways of finding the two quartiles than a full sort (look up "selection algorithms"). For instance if you're familiar with the way qsort uses pivots, observe that if you need to know the 25th and 75th items out of 100, and your pivot at some stage ends up in position 80, then there's absolutely no point recursing into the block above the pivot. You really don't care what order those elements are in, just that they're in the top quartile. But this will considerably increase the complexity of the code compared with just calling a library to sort for you. Unless you really need a minor performance boost, I think you're good as you are.

(*) Unless ruby arrays have a flag to remember they're already sorted and haven't been modified since. I don't know whether they do, but if so then using sort! a second time is of course free.

Steve Jessop
What about creating a class that contains an array and knows whether it has been sorted and not modified since?
Andrew Grimm
+1  A: 

Here's the same refactored into a one liner. You don't need an explicit return as the last line in Ruby. The return value of the last statement of the method is what's returned.

def calculate_percentile(array=[],percentile=0.0)
  # multiply items in the array by the required percentile 
  # (e.g. 0.75 for 75th percentile)
  # round the result up to the next whole number
  # then subtract one to get the array item we need to return
  array ? array.sort[((array.length * percentile).ceil)-1] : nil
end
rnicholson