views:

259

answers:

3

Ok, so say you have a really big Range in ruby. I want to find a way to get the max value in the Range.

The Range is exclusive (defined with three dots) meaning that it does not include the end object in it's results. It could be made up of Integer, String, Time, or really any object that responds to #<=> and #succ. (which are the only requirements for the start/end object in Range)

Here's an example of an exclusive range:

  past  = Time.local(2010, 1, 1, 0, 0, 0)
  now   = Time.now
  range = past...now

  range.include?(now)  # => false

Now I know I could just do something like this to get the max value:

  range.max  # => returns 1 second before "now" using Enumerable#max

But this will take a non-trivial amount of time to execute. I also know that I could subtract 1 second from whatever the end object is is. However, the object may be something other than Time, and it may not even support #-. I would prefer to find an efficient general solution, but I am willing to combine special case code with a fallback to a general solution (more on that later).

As mentioned above using Range#last won't work either, because it's an exclusive range and does not include the last value in it's results.

The fastest approach I could think of was this:

  max = nil
  range.each { |value| max = value }

  # max now contains nil if the range is empty, or the max value

This is similar to what Enumerable#max does (which Range inherits), except that it exploits the fact that each value is going to be greater than the previous, so we can skip using #<=> to compare the each value with the previous (the way Range#max does) saving a tiny bit of time.

The other approach I was thinking about was to have special case code for common ruby types like Integer, String, Time, Date, DateTime, and then use the above code as a fallback. It'd be a bit ugly, but probably much more efficient when those object types are encountered because I could use subtraction from Range#last to get the max value without any iterating.

Can anyone think of a more efficient/faster approach than this?

+1  A: 

I'm not sure about the speed (and initial tests don't seem incredibly fast), but the following might do what you need:

past  = Time.local(2010, 1, 1, 0, 0, 0)
now   = Time.now
range = past...now

range.to_a[-1]

Very basic testing (counting in my head) showed that it took about 4 seconds while the method you provided took about 5-6. Hope this helps.

Edit 1: Removed second solution as it was totally wrong.

Topher Fangio
+7  A: 

The simplest solution that I can think of, which will work for inclusive as well as exclusive ranges:

range.max

Some other possible solutions:

range.entries.last
range.entries[-1]

These solutions are all O(n), and will be very slow for large ranges. The problem in principle is that range values in Ruby are enumerated using the succ method iteratively on all values, starting at the beginning. The elements do not have to implement a method to return the previous value (i.e. pred).

The fastest method would be to find the predecessor of the last item (an O(1) solution):

range.exclude_end? ? range.last.pred : range.last

This works only for ranges that have elements which implement pred. Later versions of Ruby implement pred for integers. You have to add the method yourself if it does not exist (essentially equivalent to special case code you suggested, but slightly simpler to implement).

Some quick benchmarking shows that this last method is the fastest by many orders of magnitude for large ranges (in this case range = 1...1000000), because it is O(1):

                                          user     system      total        real
r.entries.last                       11.760000   0.880000  12.640000 ( 12.963178)
r.entries[-1]                        11.650000   0.800000  12.450000 ( 12.627440)
last = nil; r.each { |v| last = v }  20.750000   0.020000  20.770000 ( 20.910416)
r.max                                17.590000   0.010000  17.600000 ( 17.633006)
r.exclude_end? ? r.last.pred : r.last 0.000000   0.000000   0.000000 (  0.000062)

Benchmark code is here.

In the comments it is suggested to use range.last - (range.exclude_end? ? 1 : 0). It does work for dates without additional methods, but will never work for non-numeric ranges. String#- does not exist and makes no sense with integer arguments. String#pred, however, can be implented.

molf
In your last line, you have `range.last.pred`. That doesn't compile for me. Neither does `range.last.prev`. Could you please explain that portion?
Topher Fangio
So try to use `pred` and if that fails, try `range.last-1` otherwise fall back to a different solution
gnibbler
`pred` is implemented for integers in later versions of Ruby (it's there in my 1.8.7-174, and should be there in 1.9+). It is not available for all classes that implement `succ`, so it may be necessary to define it yourself.
molf
more compact and Date-aware realization is the following: `range.last-(range.exclude_end? ? 1: 0)`Still, it is not that readable, I must say.
kirushik
@kirushik: `range.last - 1` is **not** the same as `range.last.pred`. Range elements can be anything that implements `succ`, and they don't necessarily have a `-` method, let alone one that accepts `Fixnum` arguments (e.g. `String#-` does not exist). The problem is that `pred` does not exist on most classes either. The implementation can indeed be as simple as `self - 1` for some classes.
molf
Great answer, I'm accepting this. I had no idea about the `Integer#pred` method. I see that #pred hasn't been defined for any other core classes that have `#succ`, but I wonder if it is possible? After all if they can be incremented by some unit, then there must be a way to decrement them by the same unit. In the end what I think I'll do is use `#pred` if it's available, then fallback to using `range.entries.last`. I may even look at implementing `#pred` methods for a few core classes like Time, Date, DateTime, etc.
dkubb
+1  A: 

I can't think there's any way to achieve this that doesn't involve enumerating the range, at least unless as already mentioned, you have other information about how the range will be constructed and therefore can infer the desired value without enumeration. Of all the suggestions, I'd go with #max, since it seems to be most expressive.

require 'benchmark'
N = 20
Benchmark.bm(30) do |r|
  past, now  = Time.local(2010, 2, 1, 0, 0, 0), Time.now
  @range = past...now
  r.report("range.max") do
    N.times { last_in_range = @range.max }
  end
  r.report("explicit enumeration") do
    N.times { @range.each { |value| last_in_range = value } }
  end
  r.report("range.entries.last") do
    N.times { last_in_range = @range.entries.last }
  end
  r.report("range.to_a[-1]") do
    N.times { last_in_range = @range.to_a[-1] }
  end
end
                                user     system      total        real
range.max                      49.406000   1.515000  50.921000 ( 50.985000)
explicit enumeration           52.250000   1.719000  53.969000 ( 54.156000)
range.entries.last             53.422000   4.844000  58.266000 ( 58.390000)
range.to_a[-1]                 49.187000   5.234000  54.421000 ( 54.500000)

I notice that the 3rd and 4th option have significantly increased system time. I expect that's related to the explicit creation of an array, which seems like a good reason to avoid them, even if they're not obviously more expensive in elapsed time.

Mike Woodhouse