ansaurus

Question

Optimal database structure design for one to many

Answer 1

+2 A:

Generally, option 1 is preferable to option 2, not least because some of the part IDs in the assembly would themselves be assemblies.

You do have to deal with recursive or tree-structured queries. That is not particularly easy in any dialect of SQL. Some systems have better support for them than others. Oracle has its CONNECT BY PRIOR system (weird, but it sort of works), and DB2 has recursive WITH clauses, and ...

Jonathan Leffler 2010-09-08 15:21:37

Answer 2

+1 A:

NEVER, never ever use procedural languages like PHP or C# to process data structures when you have a database engine for that. Relational data structures are much more faster and flexible, and surer, than storing text. Forget about Option 2.

You could use recursive UDFs to retrieve the whole tree with no big fuss about it.

Alexander 2010-09-08 15:21:40

That's actually false. Have you benchmarked this? A SQL hierarchy is quite complex. A flat table processed in procedural programs has simpler SQL and will be -- often -- faster than the pure SQL version. Speed is not the value proposition of the database. Solution 2 gives up a high-fidelity model of the real world objects. It doesn't give up any speed.

S.Lott 2010-09-08 15:28:30

You can't really do much with a flat table, can you? That's the same as storing data in a text file. It's faster when reading, but it's a lot slower, and harder, to process the data. Think about updates, deletes, about searches, statistics, cost calculations, etc. All that can be easily done with SQL if data is relational. And it would take forever for PHP to parse large tables.

Alexander 2010-09-08 15:32:49

What are UDFs ?

Samuel 2010-09-08 15:36:39

User Defined Functions. You can define them using SQL and procedural language, then call them inside other queries, stored procedures or other UDFs. Functions, inside the database, for use in this database.

Alexander 2010-09-08 15:54:16

@Alexandar: "You can't really do much with a flat table, can you?" That's the point. It's faster, but harder to work with. Complex relational structures for a hierarchy like this are rarely faster. They are almost always more flexible. I suggest you delete the "faster" word.

S.Lott 2010-09-08 15:57:22

I don't agree. It is only slower on one specific task - reading the whole tree. Because text form acts as a cache (Which by the way can be used in combination with the relational tree, if such operations are too often) for one single operation, whereas relational structure provides all the range of operations, at much greater speeds than it would be when working with text.

Alexander 2010-09-08 19:26:34

You see, the textual form paralyzes the data, disabling it to work for you. When you get to saving data in such form, why not write the whole database as one single BLOB then?

Alexander 2010-09-08 19:27:53

"whereas relational structure provides all the range of operations, at much greater speeds than it would be when working with text". Really? Have you benchmarked this? It's rarely true.

S.Lott 2010-09-08 23:37:02

why? If I want to modify one single entry, all I have to do is write an update query, which will search through the index, rather than search through the index, extract the text, parse it, form another, etc. It's the reason relational data exists in the first place - ease of use and speed.

Alexander 2010-09-08 23:40:14

"the reason relational data exists in the first place - ease of use and speed" Really? Have you benchmarked this? The 'speed' part is rarely true. A simple hashmap or dictionary is essentially the same as the relational index, without the relational database overheads. Simple structures (like flat files) will be much faster. Please do some benchmarking on the speed issue. The flexibility is clear. The speed, however, is rarely true.

S.Lott 2010-09-09 02:25:15

I did. I had some gigabytes of text, in pure text form, as well as the same text, layed out in a relational form (word, textId, position), which was indexed. While insertions were significantly slower, searching and reading was significantly faster. And it's logical. Texts have to be parsed, to find what you seek, while relational data is indexed, so with much fewer operations you get the data you need. When you try to do complex stuff with text files, you are basically building a lame database engine, which, naturally, works much slower than optimized database engines.

Alexander 2010-09-09 08:10:12

@Alexander: You're not really comparing the same things, are you? You read, parse and load the relational database. Why not read, parse and load a non-relational database? Forcing the flat file to do read and parse every time isn't comparing the same algorithm, is it? You must use a flat-file that is already parsed. Often you can do things like serialize a hashmap of your gigabytes of text so that searching is almost instantaneous.

S.Lott 2010-09-09 15:13:09

@S.Lott: why are people using databases? Why shouldn't I create my own storage engine based on files if they work faster? Why aren't such data storages available on the internet yet, if they are faster. In the end you still end up using a lame database engine made by yourself, which most often can't compare database engines made by teams.

Alexander 2010-09-09 15:37:57

P.S. I was comparing practically the same thing. Textual representation of structured data, vs relational storage of structured data. In my case, textual form was only quicker to be created.

Alexander 2010-09-09 15:39:06

@Alexander: "why are people using databases" You said it yourself many times: Flexibility. Speed is not assured. Indeed, a relational database rarely comes close to the speed of highly-tailored (and less flexible) flat-file systems.

S.Lott 2010-09-09 16:43:21

Answer 3

A:

How about a nullable foreign key on the same table? Something like:

CREATE TABLE part (
    part_id int not null auto_increment primary key,
    parent_part_id int null,
    constraint fk_parent_part foreign key (parent_part_id) references part (part_id)
)

tdammers 2010-09-08 15:22:00

Answer 4

A:

Definitely not option 2. That is a recipe for trouble. The correct answer depends on how many potential levels of assemblies are possible, and how you think of the assemblies. Do you think of an assembly (a composite onject consisting of 2 or more atomic parts) as a part in it's own right, that can itself be used as a subpart in anothe assmebly? Or are assemblies a fundementally differrent kind of thing froma an atomic part?

If the former is the case, then put all assemblies and parts in one table, with a PartID, and add a second table that just has the construction details for those parts that are composed of multiple other parts (which themseleves may be assemblies of yet more atomic parts). This second table would look like this:

   ConstructionDetails
  PartId,  SubPartId, QuantityRequired

If you think of things more like the second way, then only put the atomic parts in the first table, and put the assemblies in the second table

   Assemblies
   AssemblyId,  PartId,  QuantityRequired

Charles Bretana 2010-09-08 15:25:53

ansaurus

tags:

views:

answers:

Optimal database structure design for one to many

related questions