views:

140

answers:

2

Inside the Rails code, people tend to use the Enumerable#inject method to create hashes, like this:

somme_enum.inject({}) do |hash, element|
  hash[element.foo] = element.bar
  hash
 end

While this appears to have become a common idiom, does anyone see an advantage over the "naive" version, which would go like:

hash = {}
some_enum.each { |element| hash[element.foo] = element.bar }

The only advantage I see for the first version is that you do it in a closed block and you don't (explicitly) initialize the hash. Otherwise it abuses a method in an unexpected way, is harder to understand and harder to read. So why is it so popular?

+1  A: 

I think it has to do with people not fully understanding when to use reduce. I agree with you, each is the way it should be

Matt Briggs
Why? When should one use or not use `reduce`/`inject`/`fold` ?
Aidan Cully
you should reduce if you have an expression you can apply that will bring it down to a single value. in this case, we are transforming from one data structure to another, which doesn't really fit, so the example code makes it fit, essentially turning an inject into an each. you could do this with a map as well if you chain a uniq at the end, but that is the same kind of trying to jam a round peg through a square hole, just because you can doesn't make it right.
Matt Briggs
Thank you for explaining your reasoning. The way I see it, `reduce` is defined to take an accumulator and a value, and returns a new accumulator with that value folded in. It doesn't matter whether the accumulator is a container type or not, or whether information is lost during the "reduction". In Haskell, list reversal (given `[1, 2, 3]` obtain `[3, 2, 1]`) is implemented according to the standard using a `fold` operation.
Aidan Cully
You're right in that it doesn't matter to the function of the accumulator is a container type. However, I personally find it a bit cheeky to say that a function that takes an n-element container as an input and produces an n-element container as an output is "multiple inputs, single output". I don't have any problem believing that the `inject` approach looks the most natural to a functional programmer. I just usually try to make the code readable even for the "humbler" developers. Those that are into Haskell should also be more proficient in figuring out stuff that looks unfamiliar to them ;)
averell
+2  A: 

Beauty is in the eye of the beholder. Those with some functional programming background will probably prefer the inject-based method (as I do), because it has the same semantics as the fold higher-order function, which is a common way of calculating a single result from multiple inputs. If you understand inject, then you should understand that the function is being used as intended.

As one reason why this approach seems better (to my eyes), consider the lexical scope of the hash variable. In the inject-based method, hash only exists within the body of the block. In the each-based method, the hash variable inside the block needs to agree with some execution context defined outside the block. Want to define another hash in the same function? Using the inject method, it's possible to cut-and-paste the inject-based code and use it directly, and it almost certainly won't introduce bugs (ignoring whether one should use C&P during editing - people do). Using the each method, you need to C&P the code, and rename the hash variable to whatever name you wanted to use - the extra step means this is more prone to error.

Aidan Cully
The only aesthetic problem with inject approach IMO is the need to explicitly return the hash from the block, as `Hash#[]=` method returns the value assigned, and not the hash itself. I don't know why they omitted to make another method which would do just that.
Mladen Jablanović
@Mladen Jablanović: You can use the K combinator (available in Ruby 1.8.7+ under the name `Object#tap`). If you don't have 1.8.7 or 1.9, you can use Marc-André Lafortune's `backports` library. Also, in older versions of `ActiveSupport`, the K combinator used to be available as `returning`. Basically, replace the two lines with this single line `hash.tap {|hash| hash[element.foo] = element.bar }`. (You may want to rename the *inner* `hash` to something else to get rid of the *outer variable shadowed by inner variable* warning.)
Jörg W Mittag
Thanks, `Object#tap` sure can be handy, but here it would arguably only harm readability (and probably performance). I assume it's inevitable in some, purer functional languages?
Mladen Jablanović
i didn't see anything in the wikipedia article that looked like converting an array to a hash. Maybe I missed something? (most of it was in Haskell, which I am really not familiar with) It seems like folds are for when you can apply a single operation to any two values in the list to reduce it down to a single value. What we are doing here is mutating the accumulator. Maybe it is just that this is an impure function that is getting under my skin, but it just doesn't feel right
Matt Briggs
Granted, if you're coming from functional programming, the `inject` method may look familiar to you. It is not that I don't understand how it works; my point is that the "straightforward" approach will look straightforward to *all* developers. The `inject` approach will be familiar only to some. The argument about the lexical scope seems a bit theoretical though. If you want to do anything with the result, you'll assign it to a variable anyway. (And in Ruby, if you return it directly at the bottom, you don't have to worry about overwriting local vars either)
averell
@averell: The `each` method will look clearer to all developers, granted. Clarity is important to code quality, but it isn't everything. To take a perhaps extreme example, `for i in 1..10 { accum += i }` is less clear than `accum = 1+2+3+4+5+6+7+8+9+10` (someone with no exposure to programming can understand the second form), but any developer who'd prefer the second form probably isn't programming professionally. Structural symmetry is also important, as is resistance to bugs. `inject` still appears better to me on both of these measures than `each`, for reasons stated but left unchallenged.
Aidan Cully
Actually, clarity, even if not everything, ranks pretty high on my list. Secondly, I'd disagree on the point that the reasons stand unchallenged: The first point was that "its a common way to calculate a single result from multiple inputs" - that's probably my biggest gripe, because I see it calculating *multiple* results for a single input. That was the main "dissonance" which prompted my to post this question. Plus, the approach is only common for those familiar with functional programming, and your code *will* be read by those who are not.
averell
As for the dependency on a variable outside the block: In most cases, you will assign the result of the operation to a variable outside the block anyway. So if you include the variable initialization in your copy-and-paste, the difference will not be as big. Of course, at the end of the day, I prefer a closed method to a variable (e.g. the #tap above looks promising). I just didn't see the `inject` as the right approach over `each`; even though the `inject` approach has less of a clarity problem in the Rails world because it appears to be widely used.
averell