views:

72

answers:

5

I have a bunch of items in my program that all belong to a specific category. I'd like to return only the items that belong to that category. The problem is that categories can have parent categories. For example, let's say there's a category "Stuff" with the child category "Food" with the child category "Fruit". I have the items, Apple, Pear, Chocolate, and Computer.

If I want to display all of the fruits, it's easy to do a database query with a "WHERE item.category = FRUIT_ID" clause. However, if I want all foods to be included, I need a way to get the fruits in there, too.

I know that some databases, like Oracle, have a notion of recursive queries, and that might be the right solution, but I don't have a lot of experiences with hierarchical data and am looking for general suggestions. Assume I have unlimited control over the database schema, the category tree only goes maybe 5 categories deep maximum, and I need it to be as ridiculously fast as possible.

+1  A: 

There's a whole book full of design strategies for representing trees in SQL. It's worth looking at just for the sheer clever points.

Jonathan Feinberg
Wow, awesome find, thanks!
CaptainAwesomePants
A: 

One possible solution is to separate the hierarchy from the actual categorization. For instance, an apple could be categorized as both a fruit and a food. The categorization has no knowledge that a fruit is a food, but you could define that somewhere else. Then, your query would be as simple as where category='food'.

Alternatively, you could go through the hierarchy before building your query and it would require something like where category='food' or category='fruit'.

Topher Fangio
+1  A: 

Have a look at the adjacency list model - it's not perfect (it's very slow to update), but in some situations (hierarchical queries), it's a great representation, especially for problems like yours.

skaffman
That's a fascinating model that I did not know about before. Seems very advantageous. I wonder if I could do something sneaky to not have to renumber as often. Maybe leave large gaps between the numbers so that I could toss a few extra categories in occasionally without renumbering. Either way, a lot to think about, thanks!
CaptainAwesomePants
+1  A: 

Assuming your category tree is small enough to be cached, you might be better off keeping the category tree in memory and have a function over that tree that will generate a list of category id's that are below a given category.

Then when you query the database, you just use an IN clause with the list of child IDs

Eric Petroelje
That's an interesting idea. It intuitively feels like a bit more of a hack than the adjacency list approach, but I'll keep it in mind. Thanks!
CaptainAwesomePants
A: 

I think your database schema is quite fine, but the implementation of this search really depends on your specific RDBMS. A lot of them have ways to perform this sort of recursion. One example I can think of is SQL Server's support of Common Table Expressions which are lightning fast alternatives to those nasty cursors.

If you specify which RDBMS you're using, you might get more specific answers.

Scott Anderson