views:

472

answers:

5

This is more a gotcha I wanted to share than a question: when printing with toString(), Java will detect direct cycles in a Collection (where the Collection refers to itself), but not indirect cycles (where a Collection refers to another Collection which refers to the first one - or with more steps).

import java.util.*;
public class ShonkyCycle {
  static public void main(String[] args) {
    List a = new LinkedList();
    a.add(a);                      // direct cycle
    System.out.println(a);         // works:  [(this Collection)]

    List b = new LinkedList();
    a.add(b);
    b.add(a);                      // indirect cycle
    System.out.println(a);         // shonky:  causes infinite loop!
  }
}

This was a real gotcha for me, because it occurred in debugging code to print out the Collection (I was surprised when it caught a direct cycle, so I assumed incorrectly that they had implemented the check in general). There is a question: why?

The explanation I can think of is that it is very inexpensive to check for a collection that refers to itself, as you only need to store the collection (which you have already), but for longer cycles, you need to store all the collections you encounter, starting from the root. Additionally, you might not be able to tell for sure what the root is, and so you'd have to store every collection in the system - which you do anyway - but you'd also have to do a hash lookup on every collection element. It's very expensive for the relatively rare case of cycles (in most programming). (I think) the only reason it checks for direct cycles is because it so cheap (one reference comparison).

OK... I've kinda answered my own question - but have I missed anything important? Anyone want to add anything?


Clarification: I now realize the problem I saw is specific to printing a Collection (i.e. the toString() method). There's no problem with cycles per se (I use them myself and need to have them); the problem is that Java can't print them. Edit dtsazza points out it's not just collections, but any object whose toString is called.

Given that it's constrained to this method, here's an algorithm to check for it:

  • the root is the object that the first toString() is invoked on (to determine this, you need to maintain state on whether a toString is currently in progress or not; so this is inconvenient).
    • as you traverse each object, you add it to an IdentityHashMap, along with a unique identifier (e.g. an incremented index).
    • but if this object is already in the Map, write out its identifier instead.

This approach also correctly renders multirefs (a node that is referred to more than once).

The memory cost is the IdentityHashMap (one reference and index per object); the complexity cost is a hash lookup for every node in the directed graph (i.e. each object that is printed).

A: 

You are right, you already answered your own question. Checking for longer cycles (especially really long ones like period length 1000) would be too much overhead and is not needed in most cases. If someone wants it, he has to check it himself.

The direct cycle case, however, is easy to check and will occur more often, so it's done by Java.

schnaader
A: 

You can't really detect indirect cycles; it's a typical example of the halting problem.

alamar
Sorry, not true. It's actually pretty simple. For example:http://stackoverflow.com/questions/261573/best-algorithm-for-detecting-cycles-in-a-directed-graph
13ren
Only if the graph is constant and doesn't change while being manipulated, which is quite easy to break by using custom collection implementations.
David Schmitt
You're doing that during toString().You're printing objects, which can do toString() on random collections, therefore triggering a loop, and you can't really check for that.
alamar
Have a look at the algorithm I added to the question: you can check for when you see the same object again. The awkward part is you need know when you invoke a new toplevel toString (as opposed to subsequent toString calls in the graph).
13ren
+4  A: 

I think fundamentally it's because while the language tries to stop you from shooting yourself in the foot, it shouldn't really do so in a way that's expensive. So while it's almost free to compare object pointers (e.g. does obj == this) anything beyond that involves invoking methods on the object you're passing in.

And at this point the library code doesn't know anything about the objects you're passing in. For one, the generics implementation doesn't know if they're instances of Collection (or Iterable) themselves, and while it could find this out via instanceof, who's to say whether it's a "collection-like" object that isn't actually a collection, but still contains a deferred circular reference? Secondly, even if it is a collection there's no telling what it's actual implementation and thus behaviour is like. Theoretically one could have a collection containing all the Longs which is going to be used lazily; but since the library doesn't know this it would be hideously expensive to iterate over every entry. Or in fact one could even design a collection with an Iterator that never terminated (though this would be difficult to use in practice because so many constructs/library classes assume that hasNext will eventually return false).

So it basically comes down to an unknown, possibly infinite cost in order to stop you from doing something that might not actually be an issue anyway.

Andrzej Doyle
+1  A: 

I'd just like to point out that this statement:

when printing with toString(), Java will detect direct cycles in a collection

is misleading.

Java (the JVM, the language itself, etc) is not detecting the self-reference. Rather this is a property of the toString() method/override of java.util.AbstractCollection.

If you were to create your own Collection implementation, the language/platform wouldn't automatically safe you from a self-reference like this - unless you extend AbstractCollection, you would have to make sure you cover this logic yourself.

I might be splitting hairs here but I think this is an important distinction to make. Just because one of the foundation classes in the JDK does something doesn't mean that "Java" as an overall umbrella does it.

Here is the relevant source code in AbstractCollection.toString(), with the key line commented:

public String toString() {
    Iterator<E> i = iterator();
    if (! i.hasNext())
        return "[]";

    StringBuilder sb = new StringBuilder();
    sb.append('[');
    for (;;) {
        E e = i.next();
        // self-reference check:
        sb.append(e == this ? "(this Collection)" : e); 
        if (! i.hasNext())
            return sb.append(']').toString();
        sb.append(", ");
    }
}
matt b
I did say "in a collection", by which I meant a java collection. Of course, if you make up your own data types, and choose to call them "collections", then you're right. Hmmm... I guess for clarity I could capitalize it to Collection (now done) BTW: I'd looked at the source code you posted at while back, though I'd forgotten the important detail that it only handles a special case. I agree it's interesting and relevant to add, as well as your explanation. But your other point *is* splitting hairs ;-)
13ren
Well there is a difference between implementing the Collection interface, and extending the AbstractCollection class. Kinda like how squares are rectangles but not all rectangles are squares...
matt b
:D hmmm... well, you're right. That's being *very* precise about exactly where the problem occurs (which is a good thing). But it's sort of side-stepping the point of the question, instead of addressing it. Or does this extra precision help in saying *why* AbstractCollection only tests direct cycles (above what the other answers say)? I do agree that precision is a good thing in itself, and it's even better when it helps to solve the problem at hand.
13ren
No this doesn't address direct vs indirect cycles at all, or even your main questions. I just wanted to make clear why toString() avoided an infinite loop in this case with the self-reference, that it's not "Java" that does this automatically.
matt b
+1  A: 

The problem with the algorithm that you propose is that you need to pass the IdentityHashMap to all Collections involved. This is not possible using the published Collection APIs. The Collection interface does not define a toString(IdentityHashMap) method.

I imagine that whoever at Sun put the self reference check into the AbstractCollection.toString() method thought of all of this, and (in conjunction with his colleagues) decided that a "total solution" is over the top. I think that the current design / implementation is correct.

It is not a requirement that Object.toString implementations be bomb-proof.

Stephen C