views:

65

answers:

4

Hi there,

I'm working on a front end for a database where each table essentially has a many to many relationship with all other tables.

I'm not a DB admin, just a few basic DB courses. The typical solution in this case, as I understand it, would be multiple link tables to join each 'real' table. Here's what I'm proposing instead: one link table that has foreign key dependencies to all other PKs of the other tables.

Is there any reason this could turn out badly in terms of scalability, flexibility, etc down the road?

A: 

So you're trying to decide whether to take a star pattern or an asterisk pattern?

I'd certainly advocate asterisk. Just as in code, there is generally a driver method, there should be a driver table if the schema is as you described. Look at the total number of tables you'll need for each number of "main" tables:

Main   Junct  Total
-------------------
   2       1      3
   3       3      6
   4       6     10
   5      10     15
   6      15     21
   7      21     28!

7 is probably the most you would have in a database schema.

In addition, this way you can do complex queries involving 3 main tables without having to go through 3 junction tables, but rather only touch one junction table no matter how many main tables you want.

Scalability? No. Flexibility? Only if your schema changes dramatically.

Christian Mann
A: 

My main problem with a single link table is if a 'link' suddenly turned into an entity. For example, you may have 'shopper' and 'store' entities. that can be many-to-many as a shopper can go to many stores and a store will have many shoppers.

Next month you decide you want to record how much a shopper spends in a store. Suddenly you either have to add a 'purchase' amount to your generic link table, or rebuild a big chunk of your application to use a specific link table for that link instead of the generic one.

Gary
+1  A: 

If I understand your proposal correctly, what you are thinking of doing is a minor variation on the them of the 'One True Lookup Table' (OTLT), which is not a good idea. In this case, perhaps, OTLT stands for 'One True Linking Table'.

The problems come when you need to maintain the referential integrity of the OTLT. For starters, what is its schema?

ReferencingTable     INTEGER (or VARCHAR(xx)?)
ReferencingId        INTEGER
ReferencedTable      INTEGER (or VARCHAR(xx)?)
ReferencedId         INTEGER

The table IDs have to be watched. They can be copies of the value in the system catalog, but you have to worry about what happens when you rebuild on of the tables (typically, the table ID changes). Or they can be separately controlled values - a parallel set of tables.

Next, you have to worry about the asymmetry in the naming of the columns in what should be a symmetric setup; the OTLT connects Table1 to Table2 just as much as it does Table2 to Table1 -- unless, indeed, your relationships are asymmetric. That just complicates life enormously.

Now, suppose you need to join primary tables Table1 to Table2 and Table2 to Table3, each via the OTLT, and that the table IDs are 1, 2, and 3, and that the 'ReferencingTable' is always the smaller of the two in the OTLT:

SELECT T1.*, T2.*, T3.*
  FROM Table1 AS T1
  JOIN OTLT   AS O1 ON T1.Id = O1.ReferencingId AND O1.ReferencingTable = 1
  JOIN Table2 AS T2 ON T2.Id = O1.ReferencedId  AND O1.ReferencedTable  = 2
  JOIN OTLT   AS O2 ON T2.Id = O2.ReferencingId AND O2.ReferencingTable = 2
  JOIN Table3 AS T3 ON T3.Id = O2.ReferencedId  AND O2.ReferencedTable  = 3

So, here you have two independent sets of joins via the OTLT.

The alternative formulation uses separate joining tables for each pair. The rows in these joining tables are smaller:

ReferencingID    INTEGER
ReferencedID     INTEGER

And, assuming that the joining tables are named Join_T1_T2, etc, the query above becomes:

SELECT T1.*, T2.*, T3.*
  FROM Table1     AS T1
  JOIN Join_T1_T2 AS J1 ON T1.Id = J1.ReferencingId
  JOIN Table2     AS T2 ON T2.Id = J1.ReferencedId
  JOIN Join_T2_T3 AS J2 ON T2.Id = J2.ReferencingId
  JOIN Table3     AS T3 ON T3.Id = J2.ReferencedId

There are just as many references to tables (5) as before, but the DBMS can automatically maintain the referential integrity on these joining tables - whereas the maintenance has to be written by hand with the OTLT. The joins are simpler (no AND clauses).

In my view, this weighs strongly against the OTLT system and in favour of specialized linking tables for each significant pairing of the primary tables.

Jonathan Leffler
This wasn't what I was planning, sorry for not being clearer.Given Table1, Table2, Table3, create the 'OTLT' with the following fields:link_idtable1_reftable2_reftable3_refNow if I want to link records in table1 and table3, I just add a row to the OTLT.
David
I'm not sure how much better that is. If you only have binary relationships, then with N tables (N > 2), you have N - 2 columns with nulls for each row. You cannot create a primary key on the table to prevent duplicate records because any of the columns can hold nulls and primary key columns can't hold nulls. You lose any problems with asymmetry - the solution is symmetric. You could end up with a three-way or more relationship - it is not clear whether that is an advantage or a defect. You are still doing multiple joins via the OTLT. It probably can be done, but I'm not convinced it is easier
Jonathan Leffler
A: 

You have two options with this setup.

  • Ensure each row indicates the link to only one table. This is a degenerate model of having individual join tables.
  • Ensure you have all the links combinations which highly inflates the size of the table. Given a row in the primary table which joins to 4, 5, and 6 records in each of three other tables you need 4 * 5 * 6 = 120 rows in your join table. You also need logic to handle no joins to a table. If you need to join to only the first tables you need to filter the 120 rows you get down to 4.

There are cases where you will have multiple table relationships, but these will be driven by the design. Releationships frequently carry information such as start and end dates. These are problematic for the one true lookup table as you will need to carry columns for each possible relationship.

BillThor
I understand the first point, however the reason I'm willing to accept that is because if I decide to add an additional table, then I just have to add a single new column the single link table vice adding multiple link tables to link the new table to each of the existing tables. I'm not clear on the second point though. If I have a single link table, couldn't I just be wasteful with column space and add 4+5+6 rows?
David
Its 4*5*6 rows. You are wasting more than column space. If you add a new row to any related table you need to add or modify several rows. If you add the first row to a child tables you change the rows where the key is null. Otherwise, you need to add rows for all the combinations of other tables. Then you get say 30 times the rows back you than you want on each query. As you add more children that count goes up. This only works well if each child can have 0 or 1 rows per parent. In this case why do you have more than 1 child table.
BillThor