tags:

views:

418

answers:

1

I'm trying to understand how grep works in this example. The code works but I'm not 100% sure in what sequence the events take place or whether I'm correctly understanding what's being returned when and where.

cars = [:Ford, :Toyota, :Audi, :Honda]
ucased_cars = cars.collect do |c| 
c.to_s 
end
.grep(/^Ford/) do |car| 
puts car.upcase 
car.upcase 
end
puts "ucased:" + ucased_cars.to_s

What I think is happening is:

  1. I define an array of Symbols
  2. I call the collect method with a block which causes each Symbol element, c, of the cars array to be converted into a String inside the block.
  3. collect returns an array of Strings
  4. grep is invoked on the array of Strings returned by collect and grep calls its own block on each array element, car, matching the search pattern, causing the element to be printed, uppercased and returned as part of an array.
  5. grep returns an array of uppercased Strings, assigning it to 'ucased_cars'
  6. The array, ucased_cars, must be converted to a String before being printed.

As far as step #4 is concerned, which of the following best describes how grep works:

[A] grep finds all strings matching the pattern. grep calls the block on this array of matches. grep returns the results of the block to the invoking function.

[B] grep finds the first string matching the pattern. grep calls the block on this match. this block's return value is piled up somewhere temporarily. grep searches the next element of the array. if it matches, grep calls the block on this match. grep adds this block's return value to the temporary "storage" of return values. grep looks at the next array element until it finds no more matches. then grep passes the stacked up return values back to the invoking function.

My Conclusion:

[A] seems to make more sense.

[B] seems like a lot of unnecessary fudging and does't seem efficient or likely.

+11  A: 

First of all, here's the documentation for grep

Let me clean up your code and explain it piece by piece

# 1
cars = [:Ford, :Toyota, :Audi, :Honda]

# 2
ucased_cars = cars.collect do |c| 
  c.to_s
end.grep(/^Ford/) do |car|  # 3
  puts car.upcase # 4
  car.upcase # 5
end
# 6

# 7
puts "ucased:" + ucased_cars.to_s
  1. Declare array of symbols

  2. Convert symbols to strings by using collect. You get ["Ford", "Toyota", "Audi", "Honda"]

  3. Feed this array of strings into grep. Any of the items which match the regexp /^Ford/ will get fed to the block

  4. The block prints out the upcased string that it got fed

  5. The block returns the upcased string, which grep then takes as the "match value"

  6. the return value from grep (which is an array of all the "match values") gets assigned to ucased_cars, it is ["FORD"], because that was the only thing that matched the regex.

  7. It then gets printed. doing a to_s on an array just prints all the elements jammedtogetherlikethis. This isn't very useful, you're better off printing ucased_cars.inspect

To answer your question about how grep works behind the scenes...

The above documentation page shows the C source for grep itself. It basically does this:

  • allocate a new ruby array (dynamically sized)
  • call rb_iterate to walk over each element in the source, passing some grep-specific code in.
  • rb_iterate is also used by collect, each_with_index and a bunch of other stuff.

As we know how collect/each/etc all work, we don't need to do any more spelunking in the source code, we have our answer, and it's your [B].

To explain in more detail, it does this:

  1. Make a new array to hold return values.
  2. Get the next item from the source
  3. If it matches the regex:
    • If a block was given, call the block, and whatever the block returns, put it in the return values.
    • If a block was not given, put the item in the return values
  4. Goto 2, repeat until no more items in the source.

As to your comment of "A seems to make a lot more sense" - I don't agree.

The idea is that the block does something with each element. If it scanned the source first, and then passed the array of matches to the block, your block would then have to call each itself, which would be cumbersome.

Secondly, it would be less efficient. What happens for example, if your block calls return or raises an error? In it's current incarnation, you avoid having to scan the rest of the source. If it had already scanned the entire source list up-front, you'd have wasted all this effort.

Orion Edwards
Beat me to it. You should perhaps specify in part 7 that #inspect or p will give much better results.
Great answer. Thanks. Your argument makes sense regarding [B] being the more reasonable route.