views:

398

answers:

3

I'm having some trouble representing an object hierarchy in Hibernate. I've searched around, and haven't managed to find any examples doing this or similar - you have my apologies if this is a common question.

I have two types which I'd like to persist using Hibernate: Groups and Items.
* Groups are identified uniquely by a combination of their name and their parent.
* The groups are arranged in a number of trees, such that every Group has zero or one parent Group.
* Each Item can be a member of zero or more Groups.

Ideally, I'd like a bi-directional relationship allowing me to get:
* all Groups that an Item is a member of
* all Items that are a member of a particular Group or its descendants.
I also need to be able to traverse the Group tree from the top in order to display it on the UI.

The basic object structure would ideally look like this:

class Group {
    ...
    /** @return all items in this group and its descendants */
    Set<Item> getAllItems() { ... }

    /** @return all direct children of this group */
    Set<Group> getChildren() { ... }
    ...
}

class Item {
    ...
    /** @return all groups that this Item is a direct member of */
    Set<Group> getGroups() { ... }  
    ...
}

Originally, I had just made a simple bi-directional many-to-many relationship between Items and Groups, such that fetching all items in a group hierarchy required recursion down the tree, and fetching groups for an Item was a simple getter, i.e.:

class Group {
    ...
    private Set<Item> items;
    private Set<Group> children;
    ...
    /** @return all items in this group and its descendants */
    Set<Item> getAllItems() {
        Set<Item> allItems = new HashSet<Item>();
        allItems.addAll(this.items);
        for(Group child : this.getChildren()) {
            allItems.addAll(child.getAllItems());
        }
        return allItems;
    }

    /** @return all direct children of this group */
    Set<Group> getChildren() {
        return this.children;
    }
    ...
}

class Item {
    ...
    private Set<Group> groups;
    /** @return all groups that this Item is a direct member of */
    Set<Group> getGroups() {
        return this.groups;
    }
    ...
}

However, this resulted in multiple database requests to fetch the Items in a Group with many descendants, or for retrieving the entire Group tree to display in the UI. This seems very inefficient, especially with deeper, larger group trees. Is there a better or standard way of representing this relationship in Hibernate?

Am I doing anything obviously wrong or stupid?


My only other thought so far was this: Replace the group's id, parent and name fields with a unique "path" String which specifies the whole ancestry of a group, e.g.:
/rootGroup
/rootGroup/aChild
/rootGroup/aChild/aGrandChild

The join table between Groups and Items would then contain group_path and item_id.

This immediately solves the two issues I was suffering previously:
1. The entire group hierarchy can be fetched from the database in a single query and reconstructed in-memory.
2. To retrieve all Items in a group or its descendants, we can select from group_item where group_path='N' or group_path like 'N/%'

However, this seems to defeat the point of using Hibernate. All thoughts welcome!

+2  A: 

I think the real question is where do you want the performance hit. If you declare all your relationships lazy, and pull what you need, then you spread the hit over time. In many cases this means that your user (person or silicon) perceives a faster reaction time. Even if your processing the whole graph at once, you don't actually need the entire graph in memory at the same time.

On the other hand, pulling the entire graph across puts the performance hit up front while making manipulation faster.

The biggest difference between the two is your specific use case. If this is a webserver, pulling the entire graph into memory will kill the server. Most people can't handle more then 7-10 options anyway, and the entire graph can easily overwhelm the user with choices making the UI difficult to navigate.

Also remember these rules of optimization:

  1. Don't
  2. Seriously, Don't. It's cheaper to throw hardware at a performance problem then it is optimize your code to the point it's not maintainable anymore.
  3. If you think you must, find the 1 bottleneck, using a profiling tool, and fix it.
Jim Barrows
Isn't Knuth's maxim "Premature optimization is the root of all evil?"
Matthew Flynn
I assume, like in several similar cases, this is not a problem of large in-memory datastructures but an increasing number of SQL operations. While each operations is really fast, in complex object-nets the number of queries and the latency of communication will get an application down to crawl.In this case i normally prefer to identify larger parts of the network and use techniques to (pre)fetch them as early a possible (see additional answer below).Memory can be reused easily.
Ralf Edmund
@Mathew Yes, i believe he did say that.@Ralf We just don't have the info. I'm just trying to cover both bases.
Jim Barrows
Thanks all for the advice, and I'm sorry there wasn't more detail provided originally. The bottleneck is the very slow response time of the database I am using (HSQLDB, minimum 200ms per request); and as you have suggested, I may be tackling the wrong problem.Perhaps I've misunderstood how lazy loading works - it didn't seem suitable, because I do not have an open database session.
Alison
A: 

In my past projects i've implemented two different but comparable solutions to this problem.

The most important (from a performance standpoint) solution was based on the fact, that i had a lot of different trees of moderate size. Instead of mapping all tree leaves using bi-directional relationships, each leave got a relationship to the trees root and a second one to their direct parent.

Using this strategy it was quite easy to identify all root nodes (FROM tree t WHERE t.parent IS NULL) and it was really easy to fetch the complete tree based on the root element (FROM tree t WHERE t.root = :root).

The relationship "Set children" was declared a @Transient attribute. Using a simple utility operation, it was quite easy to reconstruct the original tree structure at any time.

The other solution dealt with a small number of really large tree structures. But fortunately these structures had a limited depth. Instead of using just one root, i've mapped several (6 if i remember correctly) levels of "parents". This way it was easy to fetch and reconstruct complete subtrees off the database.

(Those level1 to level6 relations were mapped as "lazy many-to-one" relationships and the application was heavily dependend on the 2nd level caches.)

Ralf Edmund
Hi Ralf, thanks for the response. This approach sounds like it would definitely speed things up in certain cases, but I'm not sure if it's applicable to all use-cases.
Alison
A: 

Look into nested sets. Here's a good article:

http://www.developersdex.com/gurus/articles/112.asp

Basically, you sacrifice some performance when the hierarchy is modified in order to simplify deep retrievals, like what you want to do with items as members of descendant groups.

Taylor
@Taylor very interesting! I think I'll need to reread it a few times before I fully get it though.
Alison
Happy to help. It's been a couple years since I worked with these, but if you have specific questions.Having re-read the article I linked to, I noticed he breezed over some details. Here's a more explanatory article, courtsey of MySQL:http://dev.mysql.com/tech-resources/articles/hierarchical-data.htmlThey cover nested sets about halfway down.
Taylor