ansaurus

Question

Why does many-to-many data structure require two additional tables?

Answer 1

+3 A:

John Saunders 2009-07-26 21:42:19

So are the actual "string" tags stored in the table `Question-tags`?

Masi 2009-07-26 21:45:03

I tried to fix the problem by renaming the variables. Please, see if the same problem still persists.

Masi 2009-07-26 22:10:39

Thank you very much for your example! --- Many +++ for the exact use of foreign keys and arrows. I was confused about them.

Masi 2009-07-26 22:49:30

@Masi: I didn't draw those. The NORMA tool drew the ER diagram based on the Object-Role Modeling model I created. It also created the SQL Server statements necessary to create the tables and constraints, and would have done the same for DB2, Oracle, MySQL, Postgres, XML Schema, or even LINQ to SQL classes. It will generate a bunch of files with a .php extension, but since I don't know PHP, I can't say what they are.

John Saunders 2009-07-26 22:59:17

You use a clever way of indicating the type: .# for number, .name for a string and .id for a number sequence. --- **Where do you have the body of your questions?** You seem have Question(title), but not the actual content like in SO. --- Your answer also shows me that we need only one table for tags, since we have only one question can have many tags. I obviously misunderstood previously Rexem's last comment to my answer at the thread http://stackoverflow.com/questions/1182910/how-to-improve-a-erd/1183112#1183112

Masi 2009-07-26 23:05:44

You also use a dotted-lines to indicate a script. I cannot that with my Top Coder's UML Tool. However, Well done!

Masi 2009-07-26 23:08:56

Actually, this isn't my notation. This is Object-Role Modeling notation. Take a look at the http://www.ormfoundation.org/ site. Those things in parentheses are reference modes. The dotted lines mean it's a value type as opposed to solid lines indicating an entity type. The empty rectangles are roles. Dot on a role means it's mandatory. Bars over roles or role sequences indicate uniqueness. This all together allows the tool to verbalize the relationships:

John Saunders 2009-07-26 23:38:32

Example: User asked Question.For each Question, exactly one User asked that Question.It is possible that the same User asked more than one Question.

John Saunders 2009-07-26 23:39:06

Answer 2

+3 A:

It's a question of normalization. IMHO one of the best books on this subject is Joe Celko's SQL for Smarties. Basically, you avoid what are called "anomalies". In your example, if I delete all the questions with the "Java" tag, I would never be able to know that I ever had a tag called "Java" (delete anomaly). It's also important to crack out the table because you need the xref table to describe properties of the relationship between the principals.

JP Alioto 2009-07-26 21:44:46

Suppose you have a very big site that should be easily extensible, like things related to Google MapReduce. I cannot understand why you should cut out dependencies. Dependencies can reduce the number of interfaces and the number of tables, ensuring efficiency and entensibility. Why can you not have very depended structures, where similar tools to Git warns about anomalities? Backups would tell what is going wrong.

Masi 2009-07-27 01:04:25

Answer 3

+1 A:

http://en.wikipedia.org/wiki/Database_normalization

it's not a problem for a computer, but RDBMS theory said, that db should by normalized reducing info duplication. Here's what Dr. Codd said about need for normalization:

To free the collection of relations from undesirable insertion, update and deletion dependencies;
To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs;
To make the relational model more informative to users;
To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.

E.F. Codd, "Further Normalization of the Data Base Relational Model"

zzr 2009-07-26 21:45:52

Answer 4

+1 A:

The issue is one of how normalized you want your table structure to be. Generally, you don't want to store information in more than one place. To that end, when data may be repeated for many items, you normalize it -- move that data to a separate table where multiple rows in the other table may reference it by storing the key of the data rather than the data itself. When you have many rows sharing the same data AND you want to normalize it, you need an intermediate table to store the relations (reference pairs) between the tables.

tvanfosson 2009-07-26 21:46:03

Answer 5

A:

Usually it is a lot more information than just a tag column. So if it is a lot of information then you have redundant data (you have 2 "C" values in your example). Then if the same value lives in more than one place updates become a problem. So the rule is that the data should live in 1 place and its ID is used in other places to reference it. Then when you update it, it only needs to be done in one place.

JBrooks 2009-07-26 21:47:53

Answer 6

+1 A:

In a relational database a many-many relationship is implemented as two reciprocal one-many relationships, each of which requires an additional table (beyond the tables directly representing the entities) to implement.

First, a one-many relationship between a row in the first table to many rows in the second table.
Second, another one-many relationship between a row of the second table to many rows in the first table.

The why of it has to do with the relational database model.

Jeff Leonard 2009-07-26 21:56:17

Answer 7

+1 A:

Just to add to what others say (I wont repeat their comments)

In my experience, it's not typically called a help table but a join table. Normally you're dealing with something more complicated than a simple keyword. The 'extra' table models the relationship between the 2 other entities.

Another example might be I have a marketing campaign that goes to many recipient contacts. Neither of these 2 entities is dependent on the other. Any particular campaign will have many contacts, and any contact may be sent more than one campaign. The join table in this case models the history of who was sent which campaign.

Campaign 
 - CampaignID (PK)
 - other columns

Contact 
 - ContactID (PK)
 - other columns

CampaignContact
 - CampaignContactID (PK)
 - CampaignID (FK)
 - ContactID (FK)

This is quite different from the 1-many relationship (sometimes called a master-detail relationship). Here a canonical example is Invoice -> InvoiceItems. The invoice items link specifically to one and only one parent invoice.

Invoice
 - InvoiceID (PK)
 - other columns

InvoiceItem
 - InvoiceItemID (PK)
 - InvoiceID (FK)
 - other columns

Robert Paulson 2009-07-26 22:03:29

How can you have a table where you have one entry? -- I think that CampaingContact(CampaingID) is pk and fk to Campaign(campaignID). --- Hmm, I cannot see that it is possible to have only one entry in my tables above (circled). It should be a similar situation, and the data structures should be the same.

Masi 2009-07-26 22:34:04

Campaign, Contact and Invoice and InvoiceItem are all complex entities, but details were omitted to illustrate the relationships.

Robert Paulson 2009-07-26 22:40:34

Thank you for your edit!

Masi 2009-07-26 22:54:42

ansaurus

tags:

views:

answers:

Why does many-to-many data structure require two additional tables?

related questions