views:

326

answers:

5

Every tutorial I've seen about Multitenant database models tells you to put the TenantID in every single table:

zoos
-------
id
zoo_name
tenant_id

animals
-------
id
zoo_id
animal_name
tenant_id

However, this seems redundant to me. Why not add the tenant_id column to just the zoos table and exploit the foreign key relationship between zoos and animals?

Do you add tenant_id to every table just to keep the joins from getting too crazy? Is it a safeguard against bugs? A performance consideration?

+3  A: 

The first thing that springs to mind is that it's slower to look up animals > zoos > tenants than simply animals > tenants. And most likely this is a lookup you will do often (for example, "get all animals for a certain tenant, regardless of zoo").

For small to mid-sized applications you can get away with a more normalized structure, but for the sake of efficiency, you should go with extraneous data (and generally speaking, multitenancy applications are not small). Just make sure it doesn't go "out of sync", which is a risk that comes with having redundant data.

To answer your last paragraph, the reason is performance, pure and simple. Joins are no bad thing; they help you keep a piece of data in one place rather than three. It's definitely not to prevent bugs. Adding a tenant_id field to more tables will increase the risk of bugs (although for an id that never changes, it wouldn't be as much of an issue).

Blixt
+2  A: 

Its there for convenience and performance - in terms of normalisation you're absolutely right, it only needs to go in at the top. The problem then becomes that to get to some data (say zoo -> animal -> food -> supplier) you have to have horribly complex joins on what are notionally very simple queries.

So in the real world one has to compromise - question then becomes where and to what extent.

See this article Maybe Normalizing Isn't Normal - and its conclusion:

As the old adage goes, normalize until it hurts, denormalize until it works

as a place to start exploring the subject

Murph
At the risk of starting a nerd religious war, I can't help but wonder if this isn't an example of where a natural key would be a useful fit. If you're multi-tenant app is partitioned by domain name, and you set the tenant_id to the domain name, you can reduce joins that way as well.
Paul
It won't reduce joins - you still have the problem of either including the key at multiple levels (notionally bad) or of complex joins (also notionally bad) and regardless natural keys almost always come back and bite you (-:
Murph
+4  A: 

If one of your key design considerations is security--specifically, one client can no way no how no when access another client's data--then, depending on how you implement this security, sticking that qualifying column in every table may be necessary. One such tactic described here requires building a view on every table; assuming each table contains a tenantId column, then if properly configured each view could contain a "WHERE tenantId = SUSER_SID()" clause (and of course you configure the database so that clients can only access the views).

Another factor (as in my current job) is loading warehouse data (ETL). Tables are partitioned on tenantId (we use table partitioning, but partitioned views would also work), and data can be easily loaded or unloaded for a client without seriously impacting any other client.

But as ever, there's a lot of "it depends" involved. If there is no clear and present need, and a very low likelihood of future need, then normalize that column out. Just realize that it's more a devise of physical implementation than of conceptual or logical database design.

Philip Kelley
A: 

Well, Bob may own a giraffe in zoo No1, while Joe may own a lion in the same zoo. They are not supposed to look at each others' data.

Damir Sudarevic
+2  A: 

If I had tenantID at the top of the hierarchy (i.e. at the zoo level) you have several issues to consider.

  1. The top of the hierarchy can never change, for example if you need to add a node on the tree above the zoo level (say regions -> zoos -> animals) then it will force a re-org every time.
  2. For certain queries, you will be forced to start at the top of the hierarchy, i.e. give me a list of all animals available will force you to start at the top of the tree
  3. Why not use schemas ? Each tenant is isolated within their own schema. This will also separate the data-sets nicely.
blispr