views:

220

answers:

2

My friends and I are working on some basic Ruby exercises to get a feel for the language, and we've run into an interesting behavior that we're yet unable to understand. Basically, we're creating a tree data type where there's just one class, node, which contains exactly one value and an array of zero or more nodes. We're using rspec's autospec test runner. At one point we started writing tests to disallow infinite recursion (a circular tree structure).

Here's our test:

it "breaks on a circular reference, which we will fix later" do
  tree1 = Node.new 1
  tree2 = Node.new 1
  tree2.add_child tree1
  tree1.add_child tree2
  (tree1 == tree2).should be_false
end

Here's the Node class:

class Node
  attr_accessor :value
  attr_reader :nodes

  def initialize initial_value = nil
    @value = initial_value
    @nodes = []
  end

  def add_child child
    @nodes.push child
    @nodes.sort! { |node1, node2| node1.value <=> node2.value }
  end

  def == node
    return (@value == node.value) && (@nodes == node.nodes)
  end
end

We expect the last line of the test to result in an infinite recursion until the stack overflows, because it should continually compare the child nodes with each other and never find a leaf node. (We're under the impression that the == operator on an array will iterate over the array and call == on each child, based on the array page of RubyDoc.) But if we throw a puts into the == method to see how often it's called, we discover that it's called exactly three times and then the test passes.

What are we missing?

Edit: Note that if we replace be_false in the test with be_true then the test fails. So it definitely thinks the arrays are not equal, it's just not recursing over them (aside from the three distinct calls to ==).

+6  A: 

If you click on the method name of the RubyDoc you linked to, you will see the source (in C) of the Array#== method:

{
    // [...]
    if (RARRAY(ary1)->len != RARRAY(ary2)->len) return Qfalse;
    if (rb_inspecting_p(ary1)) return Qfalse;
    return rb_protect_inspect(recursive_equal, ary1, ary2);
}

This implementation (specifically the "recursive_equal") suggests that Array#== already implements the infinite recursion protection you're after.

Gareth
Ah, so it does. Technically the protection we're after is to ultimately prevent it from even being allowed on our data type, but that's something for the add_child method. I guess it never occurred to us that it would just silently return and not throw some kind of error. It's all part of getting used to the language, I guess. Out of curiosity, do you know the reason for the silent return? (Admittedly, I'm not well versed in C, so I'm only vaguely following this implementation.)
David
My C isn't great either, but it seems that the previous line `if (rb_inspecting_p(ary1)) return Qfalse;` is what actually triggers the "silent" false return. Basically it seems to return false if we encounter `ary1` at any point *inside* `ary1`.
Gareth
@Gareth: Makes sense. I didn't even notice those implementations were linked on that site. That alone is going to help us in our efforts (as well as brush us up a little bit on reading C). Knowing that check is there implies to me that it's easy to check for, which will come in handy in the add_child method.
David
Well, be aware that not everything in the C API is available in the Ruby API - the Array object is implemented in C rather than as a pure Ruby object. The Ruby calls are all you have to go on unless you're writing your own code as a C extension.
Gareth
@Gareth: Thanks for the info. We wrote that test wanting it to explode with stack overflow, but we were only surprised to find that it did not.
Sean Copenhaver
Woah, just found out that <Enter> submits the comment. Anyway, we have been finding that Ruby has some very powerful functions that has removed a lot of work we were expecting to have to do during our excises.
Sean Copenhaver
The recursive_equal comment is correct but immaterial to the question. Why is this a surprise? I would like to know what language has a built in equality method that would do what the OP was expecting? Certainly not a well designed one.
@beavis: As Gareth pointed out, Ruby does seem to be designed to avoid this infinite recursion. But I would hardly expect all "well-designed" languages to have that check built-in. We can certainly assume that C doesn't, since it had to be explicitly expressed in the C implementation here.
David
I'm new to Ruby but it is my understanding that the original reference implementation was written in C but did not (and still does not?) follow a specification. Thus, is it possible that running this code on another implementation of Ruby (JRuby, for instance) could yield different results (specifically, infinite recursion)?Or am I misunderstanding the whole "different implementations of Ruby" situation?
Matt
A: 

Where is the recursive call?

Putting aside the question, "how is this a tree?" I guess I am confused how you think tree1==tree2 should not terminate. After you instantiate 2 "trees" and make the two add_child calls you have two object with the value of 1 and an array of size 1. The nested arrays inside that 1 element are not iterated over by ==

Of course the arrays are iterated over. `Array#==` iterates over its contents. The only reason that this does not cause infinite recursion is that, as Gareth pointed out, the implementation of `==` has checks for that. Without that this would definitely cause infinite recursion.
sepp2k
The recursive call is because comparing the two arrays `@nodes == node.nodes` should cause `Node#==` to be called on the members of the two @nodes arrays
Gareth
Also: how is this not a tree?
sepp2k
@nodes.size==1 and node.nodes.size==1. End of mystery.
it is a node with a bunch of nodes in it. A node is a single place in the data structure. A node doesn't know what data structure it is living in and therefore can not arrange itself in a data structure.
At best it is a really weird degenerate tree. AKA a really weird linked list.
@beavis: A linked list and a tree are different only in that a linked list's node can refer to only one "child" (the next node in the linked list) whereas a tree's node can refer to any number of "children" (2 for a binary tree, or just an unbounded amount for a regular tree). And I'm not sure what you're getting at with your "@nodes.size==1 and node.nodes.size==1" statement. The comparison in this code isn't checking array length. The C implementation from Gareth does, but only for falsehood which doesn't apply here.
David
My favourite part was where he suggested 2 arrays were equal as long as they were the same size
Gareth
I never said that they were equal because they are the same size. What is in that single element? An array? No, it is a Node, and the same node. That is why they are equal. The fact that the == method protects against infinite recursion is completely irrelevant to the question. The mystery is why anyone would think that a comparison between two arrays, but with a single object in it would recurse infinitely.
Yeah, a node in a tree can have 2+ references to other nodes. What it can't do is adjust the tree, therefore this is not a tree. A tree has nodes, but a node does not have a tree.
"No, it is a Node, and the same node". It's not the same node. `tree1`'s `@nodes` contains `tree2`, and `tree2`'s `@nodes` contains `tree1`.
Gareth
Yar, I just realized that D'oh! Still the point is that the question was "why is this recursion not infinite"? The answer is there is no recursion because the two arrays are size 1. Essentially, the question could have been rewritten as: why is [1]==[4] not recursive? Pretty senseless no? The fact that [Node] has an attribute nodes which has 1 Node which has ... is completely irrelevant.
@beavis: I think you're confusing reference and value. tree1's child isn't a copy of tree2 with no further depth, it's a pointer to tree2, whose child is a pointer to tree1, etc. It's a circular reference, so it has infinite levels of children.
David
@beavis: As for a tree and a node, you continue to not understand what those are. A node is, in fact, a tree. A single node is the smallest instance of a tree. The two are the same. Any node with zero or more child nodes is a tree of nodes.
David
@beavis - "...is completely irrelevant" - No, it's not irrelevant. Comparing `[Node] == [Node]` checks 2 things in Ruby. First it checks that the array lengths are the same (as you mentioned). Then it checks that each Node in the first array is equal (==) to the Node in the same position in the second array. So, it calls `Node#==`
Gareth
Basically, calling `Node#==` checks (among other things) that the two child `nodes` arrays are equal. To do that, you have to compare individual nodes from the child arrays, and to do *that* you have to call `Node#==`. So, to compare two (non-empty) nodes, you have to compare more nodes. This is pretty much the definition of recursive
Gareth
@beavis: I'm morbidly curious... How would you organize a tree where any given node doesn't have references to its child nodes?
David