I have a set of records, stored as XML files, where the XML files are arranged in a tree structure. For each child record, elements or attributes which are not explicitly stated are assumed to be inherited from the parent record. This is easy to model in a database, with a self-referential foreign key, e.g.
Tree Structure
Foo
/ \
Bar1 Bar2
Database Table
id | parent_id | name | attribute 1 | attribute 2
1 | null | Foo | 0.4 | "tastes like chicken"
2 | 1 | Bar 1 | null | "doesn't everything"
3 | 1 | Bar 2 | 0.2 | null
(In production I would use an additional column(s) to store the tree structure, but this is omitted for the sake of simplicity.)
In this system, Bar1 would inherit a value of 0.4 for attribute 1, and Bar2 would inherit a value of "tastes like chicken" for attribute 2.
As I am far from a guru of any stripe, I have several questions about how best to work with system.
As I need to be able to export from the database in the original format, I can't simply pre-calculate and cache the "missing" records. Or can I? Note that caching would have be somewhat intelligent, as an update may change values in Foo that should propogate down to Bar1 and Bar2.
Are there standard database tools or SQL parameters that could "build" the Bar records correctly in one (or relatively few) queries? I know about common table expressions, and it seems like this is halfway to a solution, but this would return the tree of rows, and then further processing would be required to fill out all attributes appropriately.
Are their other tips and trips for working with this type of data and database structure? Maybe this is a silly question, but I don't have any formal programming education, so many concepts are quite new to me.
Note that creating separate tables for the various branches is not a possibility, as the tree can have arbitrary depth. Anticipated tree size is ~ 3000 child records, with a tree 3 levels deep, and thousands of trees to be stored.
I am working with Django, if that matters, but am semi-comfortable working with raw SQL queries (e.g. I build a Django app to manipulate directed acyclic graphs, which uses mostly raw SQL because of the complicated grouped by and having clauses).