I have a tree encoded in a MySQL database as edges:

    num INT,
    tot INT,
    PRIMARY KEY (num)
    orig INT,
    term INT
    FOREIGN KEY (orig,term) REFERENCES items (num,num)

For each leaf in the tree, items.tot is set by someone. For interior nodes, items.tot needs to be the sum of it's children. Running the following query repeatedly would generate the desired result.

UPDATE items SET tot = (
    SELECT SUM(b.tot) FROM
        tree JOIN items AS b
        ON tree.term = b.num 
        WHERE tree.orig=items.num)
        (SELECT * FROM tree WHERE orig=items.num)

(note this actually doesn't work but that's beside the point)

Assume that the database exists and the invariant are all ready satisfied.

The question is:

What is the most practical way to update the DB while maintaining this requirement? Updates may move nodes around or alter the value of tot on leaf nodes. It can assumed that leaf nodes stay as leaf nodes, interior nodes stay as interior nodes and the whole thing remains a proper tree.

Some thoughts I have had:

  • Full Invalidation, after any update, recompute everything (Um... No)
  • Set a trigger on the items table to update the parent of any row that is updated
    • This would be recursive (updates trigger updates, trigger updates, ...)
    • Doesn't work, MySQL can't update the table that kicked off the trigger
  • Set a trigger to schedule an update of the parent of any row that is updated
    • This would be iterative (get an item from the schedule, processing it schedules more items)
    • What kicks this off? Trust client code to get it right?
    • An advantage is that if the updates are ordered correctly fewer sums need to be computer. But that ordering is a complication in and of it's own.

An ideal solution would generalize to other "aggregating invariants"

FWIW I know this is "a bit overboard", but I'm doing this for fun (Fun: verb, Finding the impossible by doing it. :-)

+1  A: 

I am not sure I understand correctly your question, but this could work My take on trees in SQL.

Linked post described method of storing tree in database -- PostgreSQL in that case -- but the method is clear enough, so it can be adopted easily for any database.

With this method you can easy update all the nodes depend on modified node K with about N simple SELECTs queries where N is distance of K from root node.

I hope your tree is not really deep :).

Good Luck!

Grzegorz Gierlik

An interesting approach. The thing I don't like about it is that it uses something like N*Log(N) space. Also I have some versioning constraints that would require major mods.

Saddly, the author never gets into how to update aggregate values. I can think of a few approaches but they would depend on that implementation.

I'll have to think on this more.


The problem you are having is clear, recursion in SQL. You need to get the parent of the parent... of the leaf and updates it's total (either subtracting the old and adding the new, or recomputing). You need some form of identifier to see the structure of the tree, and grab all of a nodes children and a list of the parents/path to a leaf to update.

This method adds constant space (2 columns to your table --but you only need one table, else you can do a join later). I played around with a structure awhile ago that used a hierarchical format using 'left' and 'right' columns (obviously not those names), calculated by a pre-order traversal and a post-order traversal, respectively --don't worry these don't need to be recalculated every time.

I'll let you take a look at a page using this method in mysql instead of continuing this discussion in case you don't like this method as an answer. But if you like it, post/edit and I'll take some time and clarify.