ansaurus

Question

DB design/latency/concurrency, awful headache

Answer 1

+2 A:

I don't remember why I do it like this...

This jumps out at me, as the first thing you need to tackle!

There shouldn't be any reason you need to fetch data back to your application just to calculate the aggregate cost of each BOM. There are numerous techniques to work with "parts explosion" or hierarchical data sets in SQL.

I cover several solutions in my presentation "SQL Antipatterns Strike Back", or you can read a book like "Joe Celko's Trees and Hierarchies in SQL."

Some solutions are vendor-specific and some can be done with any plain SQL DBMS. I didn't notice what brand of database, but Jonathan correctly makes me aware that you're using PostgreSQL.

In that case, you should read about "WITH" queries, which are new in PostgreSQL 8.4, and allow you to do some sophisticated recursive query effects.

http://www.postgresql.org/docs/current/static/queries-with.html

I've implemented a system where BOM's were composed of hierarchies of individual resources, and I didn't have to do any of the batch processing you're describing (admittedly, there were only a few thousand resources in the db while I worked on it).

You should learn how to use aggregate function in SQL like SUM() and GROUP BY (any book on SQL should include this), and also techniques of storing hierarchical relationships of entities.

Since you say you don't understand databases well, I recommend that you try implementing a "toy" system before you make any changes to your real system. I'm only speaking from personal experience, but I find that I can't learn a new technical skill while I'm simultaneously trying to employ that skill in a real project.

Bill Karwin 2009-09-24 07:42:10

He mentions "I'm on PostgreSQL" at one point...

Jonathan Leffler 2009-09-24 08:01:28

Oops! I missed that.

Bill Karwin 2009-09-24 08:37:19

Answer 2

+1 A:

This sounds to me like a calculation that would benefit from being a stored procedure in the database, more or less regardless of which implementation method you use. That cuts down on the traffic between client and server, which almost invariably improves the performance of a complex set of calculations like this.

You say:

What I do now, and I don't remember why I do it like this, is I get all the items and all the BOMs out of the DB, and go for each item at a time calculating its cost recursively. Once I calculate one item, I flag it so I don't redo the cost again. (also guards against infinite recursion).

I'm puzzled about the 'flag it' part of this explanation - and not knowing why you do something the way you do it is bad news. You really need to understand what you are doing.

There are many ways to do BOM processing - and Bill Karwin has pointed you at some interesting info (the SQL Antipatterns link is about 250 slides!). The SQL Antipatterns section discusses 'naïve trees' (such as those outlined below). However, the solutions do not cover the case outlined below, where the same sub-tree can be used by multiple parents (because one sub-assembly can be a component of multiple products).

Path enumeration doesn't work: you can't use the same sub-assembly information because you build the containing product information into the path.
Nested sets work fine when the sub-assembly is used in one product; not when the sub-assembly is used in many products.
The 'closure table' solution can be adapted to cover this - it is more or less the second alternative below.

You need to consider whether it makes sense to be doing a bottom-up scan of the affected parts or whether you will be better off doing some sort of breadth-first or depth-first scan. One driver on this decision making will be the nature of the BOM data. If you have a structure where some sub-assembly is used as a component of multiple products, do you record the parts use in the sub-assembly separately for each product, or do you record that the products use the sub-assembly?

To clarify:

Sub-assembly A (P001) contains 24 x 8mm nuts (P002), 24 x 8mm x 50 mm bolts (P003), 1 x baseplate (P004), 1 x coverplate (P005).
Product B (P006) contains 1 x Sub-assembly A and a number of other parts.
Product B (P007) contains 1 x Sub-assembly B and a number of other parts.

Your BOM records could look like this (naïve tree):

Part      Component     Quantity
P001      P002          24
P001      P003          24
P001      P004          1
P001      P005          1
P006      P001          1
P007      P001          1

Or they could look like this (closure table):

Part      Component     Quantity
P001      P002          24
P001      P003          24
P001      P004          1
P001      P005          1
P006      P002          24
P006      P003          24
P006      P004          1
P006      P005          1
P007      P002          24
P007      P003          24
P007      P004          1
P007      P005          1

This second case is much less desirable - it is much harder to get the values right, doubly so if, as in the case of parts like nuts or bolts, multiple sub-assemblies could use the same part, so getting the counts right in a major deliverable product (P006, P007) would be very hard. However, recalculating the cost of any part is much simpler in the second case - you simply count up the sum of the 'cost times quantity' for each component that makes up a part. If you retain the naïve tree to record the part-structure breakdown and (re)compute the closure table when the structure (not price) of some product or sub-assembly changes, then you are probably as close to nirvana as you're likely to get.

Somewhere (but on another computer than this one) I have some old code to mess around with this stuff, using fictitious assemblies. The coding was done ... mumble, mumble ... a long time ago, and uses temporary tables (and doesn't mention nested sets or path enumeration; it does compute closure tables) for a specific DBMS - it would have to be adapted to other DBMS. Ask, and I'll dig it out.

Jonathan Leffler 2009-09-24 08:27:26

ansaurus

tags:

views:

answers:

DB design/latency/concurrency, awful headache

related questions