views:

129

answers:

4

I am building an inventory tracking system for internal use at my company. I am working on the database structure and want to get some feedback on which design is better*.

I need a recursive(i might be using this term wrong...) system where a part could be made up of zero or more parts. I though of two ways to do this but am not sure which one to use. I am not an expert in database design so maybe there is a their option that i haven't thought of.

  • Option 1: Two tables one with the part_id and the other with part_id, sub_part_id (which refers to another part_id) and quantity. so one table part_id would be unique and the other table there could be zero or more rows showing all the parts that make up a certain part.

  • Option 2: One table with part_id and assembly. assembly would be a text field that looks something like this, part_id,quantity;part_id,quanity;.... I would then use the PHP explode() function to separate by semi-colon and again by comma to get an array of the sub parts.

I hope this all makes sense. I am using PHP/MySQL.

*community wiki because this may be subjective.

+2  A: 

Generally, option 1 is preferable to option 2, not least because some of the part IDs in the assembly would themselves be assemblies.

You do have to deal with recursive or tree-structured queries. That is not particularly easy in any dialect of SQL. Some systems have better support for them than others. Oracle has its CONNECT BY PRIOR system (weird, but it sort of works), and DB2 has recursive WITH clauses, and ...

Jonathan Leffler
+1  A: 

NEVER, never ever use procedural languages like PHP or C# to process data structures when you have a database engine for that. Relational data structures are much more faster and flexible, and surer, than storing text. Forget about Option 2.

You could use recursive UDFs to retrieve the whole tree with no big fuss about it.

Alexander
That's actually false. Have you benchmarked this? A SQL hierarchy is quite complex. A flat table processed in procedural programs has simpler SQL and will be -- often -- faster than the pure SQL version. Speed is not the value proposition of the database. Solution 2 gives up a high-fidelity model of the real world objects. It doesn't give up any speed.
S.Lott
You can't really do much with a flat table, can you? That's the same as storing data in a text file. It's faster when reading, but it's a lot slower, and harder, to process the data. Think about updates, deletes, about searches, statistics, cost calculations, etc. All that can be easily done with SQL if data is relational. And it would take forever for PHP to parse large tables.
Alexander
What are UDFs ?
Samuel
User Defined Functions. You can define them using SQL and procedural language, then call them inside other queries, stored procedures or other UDFs. Functions, inside the database, for use in this database.
Alexander
@Alexandar: "You can't really do much with a flat table, can you?" That's the point. It's faster, but harder to work with. Complex relational structures for a hierarchy like this are rarely faster. They are almost always more flexible. I suggest you delete the "faster" word.
S.Lott
I don't agree. It is only slower on one specific task - reading the whole tree. Because text form acts as a cache (Which by the way can be used in combination with the relational tree, if such operations are too often) for one single operation, whereas relational structure provides all the range of operations, at much greater speeds than it would be when working with text.
Alexander
You see, the textual form paralyzes the data, disabling it to work for you. When you get to saving data in such form, why not write the whole database as one single BLOB then?
Alexander
"whereas relational structure provides all the range of operations, at much greater speeds than it would be when working with text". Really? Have you benchmarked this? It's rarely true.
S.Lott
why? If I want to modify one single entry, all I have to do is write an update query, which will search through the index, rather than search through the index, extract the text, parse it, form another, etc. It's the reason relational data exists in the first place - ease of use and speed.
Alexander
"the reason relational data exists in the first place - ease of use and speed" Really? Have you benchmarked this? The 'speed' part is rarely true. A simple hashmap or dictionary is essentially the same as the relational index, without the relational database overheads. Simple structures (like flat files) will be much faster. Please do some benchmarking on the speed issue. The flexibility is clear. The speed, however, is rarely true.
S.Lott
I did. I had some gigabytes of text, in pure text form, as well as the same text, layed out in a relational form (word, textId, position), which was indexed. While insertions were significantly slower, searching and reading was significantly faster. And it's logical. Texts have to be parsed, to find what you seek, while relational data is indexed, so with much fewer operations you get the data you need. When you try to do complex stuff with text files, you are basically building a lame database engine, which, naturally, works much slower than optimized database engines.
Alexander
@Alexander: You're not really comparing the same things, are you? You read, parse and load the relational database. Why not read, parse and load a non-relational database? Forcing the flat file to do read and parse every time isn't comparing the same algorithm, is it? You must use a flat-file that is already parsed. Often you can do things like serialize a hashmap of your gigabytes of text so that searching is almost instantaneous.
S.Lott
@S.Lott: why are people using databases? Why shouldn't I create my own storage engine based on files if they work faster? Why aren't such data storages available on the internet yet, if they are faster. In the end you still end up using a lame database engine made by yourself, which most often can't compare database engines made by teams.
Alexander
P.S. I was comparing practically the same thing. Textual representation of structured data, vs relational storage of structured data. In my case, textual form was only quicker to be created.
Alexander
@Alexander: "why are people using databases" You said it yourself many times: Flexibility. Speed is not assured. Indeed, a relational database rarely comes close to the speed of highly-tailored (and less flexible) flat-file systems.
S.Lott
A: 

How about a nullable foreign key on the same table? Something like:

CREATE TABLE part (
    part_id int not null auto_increment primary key,
    parent_part_id int null,
    constraint fk_parent_part foreign key (parent_part_id) references part (part_id)
)
tdammers
A: 

Definitely not option 2. That is a recipe for trouble. The correct answer depends on how many potential levels of assemblies are possible, and how you think of the assemblies. Do you think of an assembly (a composite onject consisting of 2 or more atomic parts) as a part in it's own right, that can itself be used as a subpart in anothe assmebly? Or are assemblies a fundementally differrent kind of thing froma an atomic part?

If the former is the case, then put all assemblies and parts in one table, with a PartID, and add a second table that just has the construction details for those parts that are composed of multiple other parts (which themseleves may be assemblies of yet more atomic parts). This second table would look like this:

   ConstructionDetails
  PartId,  SubPartId, QuantityRequired

If you think of things more like the second way, then only put the atomic parts in the first table, and put the assemblies in the second table

   Assemblies
   AssemblyId,  PartId,  QuantityRequired
Charles Bretana