views:

108

answers:

6

Hi Everyone. I'm building a database that makes use of lots of many-to-many relations. For example, I have part items and machine items where each part goes into many machines and each machine has many parts. The purpose of this database is to be able to input a list of parts and get back a list of machines composed of only parts in that list, or vice versa. I'm new to database design, so I've currently been modeling this relation with translation tables, like such:

create table machine ( machine_id number, machine_name char(30) )

create table part ( part_id number, part_name char(30) )

create table relations ( part_id number, machine_id number )

This seems like a very ugly and naive way to go about modeling this sort of relation. Are there better ways of doing this, like somehow storing and updating lists in single columns under their respective tables or something better I haven't thought of yet? I'm trying to find a way to do this that minimizes query time and post-processing intensity.

Thanks.

+7  A: 

Welcome to relational databases. No, there isn't a better way. I'd recommend naming your tables slightly better; Machine, Part, Machine_Part or something of the sort, because you're probably going to end up with a bunch of these type of tables.

Post processing isn't really an issue, it's fairly easy to deal with these relationships using simple INNER JOINs in SQL or using an ORM. Databases are designed to cope with this kind of stuff.

nizmow
+1 on the better naming of your tables.
Mark0978
Part and Machine are 100% duplicates. It is neither normalised, nor a "relational database"
PerformanceDBA
+1  A: 

This is not naive, this is the proper way of an ER model. Separating entities with relations, the classic design pattern. Worry not about query/join overhead, RDBMSs are optimized for this and can fly through those join queries.

You can also make the relation table have (part_id,machine_id) as a compound primary key. Better yet, create them as indexed organized tables and avoid any (negligible) overhead of the table data.

Xepoch
Thanks! I didn't realize that the dbms optimization played such a large role in the performance of the database in this case. Also nice to know that the proper name is a relation table, not a translation table. :)
S.C.
@SC in the relation, you only ever have to hit the index for the join (assuming you put the compound PK or unique index on it).
Xepoch
@xepoch and SC: please stop using the word Relation incorrectly. In database terms, relation means something completely different from a relationship.
dportas
@dprotas, not necessarily. Codd indeed defines relation as a table of tubles with attributes. Earlier approaches (e.g. Peter Chen's) to ER clearly defined entities distinct from relations(hips).
Xepoch
ER modelling is not earlier than the relational model. Peter Chen wrote his first ER paper in 1976 o believe - 7 years after Codd's RM. The term relation goes back much further - about 100 years I think. AFAIK Chen never used the term relation to mean the same as relationship. Do you disagree and if so can you give a reference for that use of the word? In any case present day usage of those terms is pretty well defined and they ought not to be confused.
dportas
@dprotas, we should create a separate question on this. :) The ER is often (correctly?) referred-to as the Entity/Relation model, the relationship a distinct record thereof of the relation. Anyway, this is how I learned it at Uni when during relational algebra as well as has been discussed during engagements with Burleson.
Xepoch
The correct name is Entity-Relationship Model as you can confirm if you look up Chen's paper. A relation on the other hand is.a data structure used to represent both entities and relationships - as you rightly said.
dportas
+1  A: 

That's normalized data - the most scalable means of storing any combination of a part record to a machine record.

I'm trying to find a way to do this that minimizes query time and post-processing intensity.

Deal with the issue when actually encounter it -- it's premature optimization otherwise.

OMG Ponies
this is neither normalised, nor a "relational database". Part and Machine are 100% duplicates.
PerformanceDBA
@PerformanceDBA: There's no mention of "relational database" in my answer, and the many-to-many table relationship in the OP is infact normalized in the assumption that the primary key is both columns. Even then, I don't see substantiation of "Part and Machine are 100% duplicates" - only their primary keys could be.
OMG Ponies
0. Part
PerformanceDBA
PerformanceDBA
@PerformanceDBA: Assuming the primary key of the RELATIONS table is both part and machine id's, I'd like to see how you think that's not 3NF.
OMG Ponies
@OMG Ponies: (As a result of your previous comments, I changed a few words in my post to improve clarity.) It appears you are not reading either my post carefully, or my comments. You are looking at the wrong table *relations* (which I have identified as correct in my post). The Normalisation error, the 100% duplication, is in *part vs machine*. Compare with my posted 5NF solution (the link).
PerformanceDBA
@PerformanceDBA: What you're talking about is something entirely of your own invention, and it's sad to see that you've held others accountable for your delusion. I won't be a part of it any longer.
OMG Ponies
+1  A: 

A clear way to think about it: logical vs. physical

A logical many-to-many relationship between two entities (call them A and B) has a physical implementation of three tables (call them A, B, and C). This implementation involves a physical one-to-many relationship from A to C. Another from B to C. Table C is known as a "join table" or "junction table"

Aaron F.
That's also how I like to think of many-to-many joins - two one-to-many relationships that, by transitivity, model a many-to-many relationship.
Scott Mitchell
The technical term, for the physical table that substantiates the logical many::many logical relation, is Associative Table.
PerformanceDBA
+1  A: 

Edited to Improve Clarity

Actually there are better ways to design that in a true Relational Database (it has been done thousands of times).

The structure is called a Bill of Materials structure, and existed long before the Relational Model. Of course, I will give you the Relational version. This is high performance and has no processing restraints, eg. you can produce a tree structure (BoM) report using a simple stored proc which is called recursively.

A couple of things that needs to be understood before we go for the chase.

  1. From your desc, the Machines are actually Assemblies (of Parts). These Assemblies are used in (higher level) Assemblies, and so on. So let's rename "relations" as Assembly.

  2. Your Part and Machine tables have not been normalised. If you inspect them closely you will find many identical columns(PartId::MachineId, PartName::MachineName are shown). In fact except for the lowest Part-of-a-Machine and the highest Machine-containing-Parts, all intervening Machines (containing Parts) are actually also Parts in (higher level) Machines. So you have monstrous data duplication. That is a large Normaalisation error. That will kill performance. The resulting Update Anomalies will cause data integrity problems. both those issues will come to the fore long before your other concerns do.
    .
    So that must be corrected by the ordinary process of Normalisation: Part and Machine become one table: Part. The fact that a Part may be an Assembly (of Components) is based on the context of usage; the fact that a Part may be a Component (in Assemblies) is the likewise based on context. As a unit sitting on the shelf, that you must keep an inventory of, Parts and Machines are each just Parts. The inventory control columns are located in Part; the context columns are located in Assembly.

  3. Many-to-many relations (at the logical they are relations, not tables) are implemented at the physical level as Associative tables, there is no way around that. You already have that. What you are about to see may look different, but it is not.

Anything you get from me will be 5NF; zero data duplication; zero Update Anomalies. This runs in production at more than a few manufacturing sites. It so happens that I use the required structure as part of the tutorial in my Advanced classes, so you can just look it up. Feel free to ask any questions, related to understanding or otherwise. It is in the tutorial precisely because many developers do not understand the structure or how to navigate it. Note the very tight (tried and tested) naming conventions.

Click on Part or Assembly (you can ignore the rest of the Model, but I am happy to answers questions about it as well). Both the methodology, and the diagram notation, is IDEF1X (it is a "strict" rendition of the RM).

Part-Assembly-Component Example

And yes, Data Integrity is maintained, and Assembly⇢Component searches are served 100% (allowing for covered queries), from the Assembly Primary Key index (that is to say, they are already highly optimised, I do not need to make them some vendors this or that to get more speed out of it). A second Unique Index serves Component⇢Assembly Searches 100%.

Enjoy.

PerformanceDBA
+2  A: 

Short answer: You're headed in the right direction. This is the textbook way to create many-to-many relationships.

However, a "machine" is probably just a special case of a "part", namely, a part that in some sense you consider "a complete thing" or a deliverable item.

Most people who do this create a hierarchical structure. You have:

part(part_id, description, ... maybe other data like size and weight, etc ...)
assembly(parent_part_id, child_part_id)

Then a top-level part can have many component parts, and each of these components can have other components, etc. The advantage of this is that when an assembly is used on more than one final product, you only have to describe the break-out once. For example, I used to work for a company that made dishwashers. In out parts table, we'd have a record for, say, Motor model 29B. That motor would have many component parts -- a casing, a rotor, some stators, electrical harness, etc. But the same motor might be used in several different models of dishwasher. You don't want to have to give the whole parts list for the motor all over again for every dishwasher that it is used on. Not only is that a pain for someone to type in, but if you ever make a change, you have to be sure to change it everywhere its used. So you just give the breakout for the motor once, and then reference the motor as a unit in all higher-level assemblies.

Yes, some parts are "final", that is, they don't go on a higher-level assembly. But you don't need a different table for this. That just means that for this particular part, there is no Assembly record where this is the child.

Parts is parts. Put them all in one table.

Some database engines, e.g. Oracle, have commands to chase a hierarchy like this. If yours doesn't, you'd have to do it in code. But it's not that tough. People do it all the time.

Jay