views:

238

answers:

7

I have implemented a sorting algorithm for a custom string that represents either time or distance data for track & field events. Below is the format

'10:03.00 - Either ten minutes and three seconds or 10 feet, three inches

The result of the sort is that for field events, the longest throw or jump would be the first element while for running events, the fastest time would be first. Below is the code I am currently using for field events. I didn't post the running_event_sort since it is the same logic with the greater than/less than swapped. While it works, it just seems overly complex and needs to be refactored. I am open to suggestions. Any help would be great.

event_participants.sort!{ |a, b| Participant.field_event_sort(a, b) }

class Participant
def self.field_event_sort(a, b)
  a_parts = a.time_distance.scan(/'([\d]*):([\d]*).([\d]*)/)
  b_parts = b.time_distance.scan(/'([\d]*):([\d]*).([\d]*)/)

  if(a_parts.empty? || b_parts.empty?)
    0
  elsif a_parts[0][0] == b_parts[0][0]
    if a_parts[0][1] == b_parts[0][1]
      if a_parts[0][2] > b_parts[0][2]
        -1
      elsif a_parts[0][2] < b_parts[0][2]
        1
      else
        0
      end
    elsif a_parts[0][1] > b_parts[0][1]
      -1
    else
      1
    end  
  elsif a_parts[0][0] > b_parts[0][0] 
    -1
  else
    1
  end
end
end
A: 

Your routine looks fine but you could just remove the ''', ':' and '.' and treat the result as a numeric string. In other words the 10' 5" would become 1005 and 10' 4" would be 1004. 1005 is clearly more than 1004.

Since the higer order elements are on the left, it will sort naturally. This also works with time for the same reasons.

Craig
I like the idea of multiplying the values rather than just removing the , : and . as it handle common mistakes. Without the multiplication 10'04" = 1004, 10'4" = 104.
RichH
Yes, but in this order "10'" <=> "9'" == -1, so it thinks "9'" is a higher value, and will sort it first.
rampion
A: 

Why not do

a_val = a_parts[0][0].to_i * 10000 + a_parts[0][1].to_i * 100 + a_parts[0][2].to_i
b_val = b_parts[0][0].to_i * 10000 + b_parts[0][1].to_i * 100 + b_parts[0][2].to_i
a_val <=> b_val

The numbers won't make sense to subtract, etc but they should sort ok.

You may want to check [1] and [2] are always two digits in the regexp.

RichH
Only needs to be * 1000 but this does the same thing I said so it *must* be a good answer! :)
Craig
It needs to be two orders of magnitude between * 10000 and *100 as you could have up to 59 seconds or 11 inches. Great minds think alike (I just type a few seconds slower then you!) :)
RichH
Yeah, this is a bad idea, as these values are all still strings. They haven't been converted to integers, so multiplying them by anything will just replicate the string.
rampion
rampion: Good spot, I've added .to_i
RichH
A: 

I don't know ruby but here's some c-like pseudo code that refactors this a bit.

/// In c, I would probably shorten this with the ? operator.
int compareIntegers(a, b) {
    int result = 0;
    if (a < b) {
        result = -1;
    } else if (a > b) {
        result = 1;
    }
    return result;
}

int compareValues(a, b) {
    int result = 0;
    if (!/* check for empty*/) {
        int majorA = /* part before first colon */
        int majorB = /* part before first colon */
        int minorA = /* part after first colon */
        int minorB = /* part after first colon */

        /// In c, I would probably shorten this with the ? operator.
        result = compareIntegers(majorA, majorB);
        if (result == 0) {
            result = compareIntegers(minorA, minorB);
        }
    }
    return result;
}
Jon Hess
You could also use Craig's suggestion to combine majorA, and minorA into one value, and only call compareIntegers once. Additionally, if the part that "gets part before first colon" returns 0 when there is no colon, you could avoid the "if (!empty)" guard.
Jon Hess
+1  A: 

Instead of implementing the sort like this, maybe you should have a TrackTime and FieldDistance models. They don't necessarily need to be persisted - the Participant model could create them from time_distance when it is loaded.

You're probably going to want to be able to get the difference between two values, validate values as well sort values in the future. The model would make it easy to add these features. Also it would make unit testing a lot easier.

I'd also separate time and distance into two separate fields. Having dual purpose columns in the database only causes pain down the line in my experience.

RichH
A: 

I agree that converting to integers will make is simpler. Also note that for integers

if a > b
  1
elsif a < b
  -1
else 
  0

can be simplified to a<=>b. To get the reverse use -(a <=> b).

Kathy Van Stone
+4  A: 

This is a situation where #sort_by could simplify your code enormously:

event_participants = event_participants.sort_by do |s|
 if s =~ /'(\d+):(\d+)\.(\d+)/
  [ $1, $2, $3 ].map { |digits| digits.to_i } 
 else
  []
 end
end.reverse

Here, I parse the relevant times into an array of integers, and use those as a sorting key for the data. Array comparisons are done entry by entry, with the first being the most significant, so this works well.

One thing you don't do is convert the digits to integers, which you most likely want to do. Otherwise, you'll have issues with "100" < "2" #=> true. This is why I added the #map step.

Also, in your regex, the square brackets around \d are unnecessary, though you do want to escape the period so it doesn't match all characters.

One way the code I gave doesn't match the code you gave is in the situation where a line doesn't contain any distances. Your code will compare them as equal to surrounding lines (which may get you into trouble if the sorting algorithm assumes equality is transitive. That is a == b, b == c implies a ==c, which is not the case for your code : for example a = "'10:00.1", b = "frog", c="'9:99:9").

#sort_by sorts in ascending order, so the call to #reverse will change it into descending order. #sort_by also has the advantage of only parsing out the comparison values once, whereas your algorithm will have to parse each line for every comparison.

rampion
Thank you for the help. I need to add some validations to make sure the time/distance field is in the correct format. I need to read up on the .map method to understand what it is really doing.
Chris Williams
#map takes an Enumerable (in this case, an Array), and runs the given block on each element, returning an Array of theresults (in the same order, in this case)
rampion
A: 

In this scenario:

Since you know you are working with feet, inches, and (whatever your third unit of measure is), why not just create a total sum of the two values you are comparing?

So after these two lines:

a_parts = a.time_distance.scan(/'([\d]):([\d]).([\d]*)/) b_parts = b.time_distance.scan(/'([\d]):([\d]).([\d]*)/)

Generate the total distance for a_parts and b_parts:

totalDistanceA = a_parts[0][0].to_i * 12 + a_parts[0][1].to_i + b_parts[0][2].to_i * (whatever your third unit of measure factor against the size of an inch)

totalDistanceB = b_parts[0][0].to_i * 12 + b_parts[0][1].to_i + b_parts[0][2].to_i * (whatever your third unit of measure factor against the size of an inch)

Then return the comparison of these two values:

totalDistanceA <=> totalDistanceB

Note that you should keep the validation you are already making that checks if a_parts and b_parts are empty or not:

a_parts.empty? || b_parts.empty?

For doing the sorting by time scenario, do the exact same thing except with different factors (for example, 60 seconds to a min).

IDreamOf362