tags:

views:

94

answers:

3

Inspired by http://stackoverflow.com/questions/2552363/how-can-i-marshal-a-hash-with-arrays I wonder what's the reason that Array#<< won't work properly in the following code:

h = Hash.new{Array.new}
#=> {}
h[0]
#=> []
h[0] << 'a'
#=> ["a"]
h[0]
#=> [] # why?!
h[0] += ['a']
#=> ["a"]
h[0]
#=> ["a"] # as expected

Does it have to do with the fact that << changes the array in-place, while Array#+ creates a new instance?

A: 
h = Hash.new{ |a,b| a[b] = Array.new }
h[0] << "hello world"
#=> ["hello world"]
h[0]
#=> ["hello world"]
fl00r
Ok, but what was wrong with my code?
Mladen Jablanović
Your code does not assign into the Hash. This solution makes the `<<` case work by assigning every new Array into the Hash with the `a[b] =`. (The downside is that you end up putting a new Array into the Hash for every key you index it with. It depends on your use whether or not this is desirable. Fortunately Ruby gives you the choice, as illustrated here.)
Arkku
+3  A: 

The problem in your code is that h[0] << 'a' makes an new Array and gives it out when you index with h[0], but doesn't store the modified Array anywhere after the << 'a' because there is no assignment.

Meanwhile h[0] += ['a'] works because it's equivalent to h[0] = h[0] + ['a']. It's the assignment ([]=) that makes the difference.

The first case may seem confusing, but it is useful when you just want to take some unchanging default element out of a Hash when the key is not found. If this case is common, you'd end up populating the Hash with a great number of useless values just by indexing it. Hence you need to explicitly assign to the Hash.

Arkku
+4  A: 

If you create a Hash using the block form of Hash.new, the block gets executed every time you try to access an element which doesn't actually exist. So, let's just look at what happens:

h = Hash.new { [] }
h[0] << 'a'

The first thing that gets evaluated here, is the expression

h[0]

What happens when it gets evaluated? Well, the block gets run:

[]

That's not very exciting: the block simply creates an empty array and returns it. It doesn't do anything else. In particular, it doesn't change h in any way: h is still empty.

Next, the message << with one argument 'a' gets sent to the result of h[0] which is the result of the block, which is simply an empty array:

[] << 'a'

What does this do? It adds the element 'a' to an empty array, but since the array doesn't actually get assigned to any variable, it is immediately garbage collected and goes away.

Now, if you evaluate h[0] again:

h[0] # => []

h is still empty, since nothing ever got assigned to it, therefore the key 0 is still non-existent, which means the block gets run again, which means it again returns an empty array (but note that it is a completely new, different empty array now).

h[0] += ['a']

What happens here? First, the operator assign gets desugared to

h[0] = h[0] + ['a']

Now, the h[0] on the right side gets evaluated. And what does it return? We already went over this: h[0] doesn't exist, therefore the block gets run, the block returns an empty array. Again, this is a completely new, third empty array now. This empty array gets sent the message + with the argument ['a'], which causes it to return yet another new array which is the array ['a']. This array then gets assigned to h[0].

Lastly, at this point:

h[0] # => ['a']

Now you have finally actually put something into h[0] so, obviously, you get out what you put in.

So, to answer the question you probably had, why don't you get out what you put in? You didn't put anything in in the first place!

If you actually want to assign to the hash inside the block, you have to, well assign to the hash inside the block:

h = Hash.new {|this_hash, nonexistent_key| this_hash[nonexistent_key] = [] }
h[0] << 'a'
h[0] # => ['a']

It's actually fairly easy to see what is going on in your code example, if you look at the identities of the objects involved. Then you can see that everytime you call h[0], you get a different array.

Jörg W Mittag
Thanks a lot for your (as always) elaborate answer! I understand it now perfectly well, the key was in assignment which never happened. However, I find pretty confusing and counter-intuitive the fact that `h[0] << 'a'` doesn't actually affect the value in `h` with the key `0`. I guess that's one of the rare cases when the syntax sugar for invoking Ruby methods sweetens things a bit too much... :/
Mladen Jablanović
What do you mean? It *does* affect the value! It gets the value out of `h` and it appends `'a'` to it. You can clearly see that happening, if you type it into `irb`, in fact, you actually showed it in the code example in your question. This has nothing to do with Ruby's syntax, it doesn't even have anything to do with Ruby or even with programming. Understanding that a box and the thing inside the box are not the same thing, is pretty universal. It applies just as well to `final` fields in Java, for example, or to frozen objects in Ruby.
Jörg W Mittag
The confusion probably arises from the fact that the value in the original question is returned by the default proc of `h`, i.e. `h` is a magic box where you get a brand new empty array out every time you index it with a key that has no value in `h`. So, the value *from* `h` is certainly affected, it's just that it never really was *in* `h` - to change the contents of `h` you need to put the value back in the magic box.
Arkku
@Joerg: No, you're right, it affects the value, but it doesn't affect the hash itself, i.e. the value stays unassigned. In your "box and a thing in it" analogy, `h[0]<<'a'` would mean "append 'a' to the thing in a box `0`". `a` really gets appended to the value, but the box stays empty. I mentioned syntax because I think it's much clearer when written as a sequence of method calls, which it is actually - then one can clearly see the absence of assignment.
Mladen Jablanović