views:

149

answers:

5

Hi,

I'm working on a multi-user internet database-driven website with SQL Server 2008 / LinqToSQL / custom-made repositories as the DAL. I have run across a normalization problem which can lead to an inconsistent database state if exploited correctly and I am wondering how to deal with the problem.

The problem: Several different companies have access to my website. They should be able to track their Projects and Clients at my website. Some (but not all) of the projects should be assignable to clients.

This results in the following database schema:

**Companies:**
ID
CompanyName

**Clients:** 
ID 
CompanyID (not nullable)
FirstName
LastName


**Projects:**
ID
CompanyID (not nullable)
ClientID (nullable)
ProjectName

This leads to the following relationships:

Companies-Clients (1:n)
Companies-Projects (1:n)
Clients-Projects(1:n)

Now, if a user is malicious, he might for example insert a Project with his own CompanyID, but with a ClientID belonging to another user, leaving the database in an inconsistent state.

The problem occurs in a similar fashion all over my database schema, so I'd like to solve this in a generic way if any possible. I had the following two ideas:

  • Check for database writes that might lead to inconsistencies in the DAL. This would be generic, but requires some additional database queries before an update and create queries are performed, so it will result in less performance.

  • Create an additional table for the clients-Projects relationship and make sure the relationships created this way are consistent. This also requires some additional select queries, but far less than in the first case. On the other hand it is not generic, so it is easier to miss something in the long run, especially when adding more tables / dependencies to the database.

What would you do? Is there any better solution I missed?

Edit: You might wonder why the Projects table has a CompanyID. This is because I want users to be able to add projects with and without clients. I need to keep track of which company (and therefore which website user) a clientless project belongs to, which is why a project needs a CompanyID.

A: 

Hi, If you want to use the table like this and avoid the all the new queries just put triggers on the table and when user tries to insert row with wrong data the trigger with stop him. Best Regards, Iordan

IordanTanev
I think the point is that without some means of recording what is or isn't allowable in the database, the trigger can't determine whether data which is valid actually makes sense.
ninesided
Exactly. Instead of using triggers I might also try to do the same in my DAL, but I'd need some additional select queries for each update / insert / delete to make this work, which would result in the first possible solution I mentioned above.
Adrian Grigore
+2  A: 

I'd go with with the latter, having one or more tables that define the allowable relationships between entities.

ninesided
+1  A: 

Note, there's no circularity in the references you have, so the title is misleading.

What you have is the possibility of conflicting data, that's different.


Why do you have "CompanyID" in the project table? The ID of the company involved is implicitly given by the client you link to. You don't need it.

Remove that column and you've removed your problem.

Additionally, what is the purpose of the "name" column in the client table? Can you have a client with one name, differing from the name of the company?

Or is "client" the person at that company?


Edit: Ok with the clarification about projects without companies, I would separate out the references, but you're not going to get rid of the problem you're describing without constraints that prevent multiple references being made.

A simple constraint for your existing tables would be that not both the CompanyID and ClientID fields of the project row could be non-null at the same time.

Lasse V. Karlsen
Please see my edit above for clarifications
Adrian Grigore
That would be an option, although it does make listing all projects of a given company more complicated and slower since I'd have to follow both reference paths to get all projects. I think I'll still go with a separate Clients - Projects references table. I'm currently writing a abstract Linq entity base class that ensures that no entity can belong to different companies when following different reference paths.
Adrian Grigore
A: 

My first thought would be to create a special client record for each company with name "No client". Then eliminate the CompanyId from the Project table, and if a project has no client, use the "No client" record rather than a "normal" client record. If processing of such no-client's is special, add a flag to the no-client record to explicitly identify it. (I'd hate to rely on the name being "No Client" or something like that -- too fuzzy.)

Then there would be no way to store inconsistent data so the problem would go away.

Jay
I've thought about this option too, but quickly tossed the idea since it creates another range of database possibilities to render my database inconsistent or to show garbage data in my views. For example, what if I forget to check whether the client is the "No Client" entity on the Clients/Delete action method? Problems like this are why I am aiming for a generic approach
Adrian Grigore
A: 

In the end I implemented a completely generic solution which solves my problem without much runtime overhead and without requiring any changes to the database. I'll describe it here in case someone else has the same problem.

First off, the approach only works because the only table that other tables are referencing through multiple paths is the Companies table. Since this is the case in my database, I only have to check whether all n:1 referenced entities of each entity that is to be created / updated / deleted are referencing the same company (or no company at all).

I am enforcing this by deriving all of my Linq entities from one of the following types:

  1. SingleReferenceEntityBase - The norm. Only checks (via reflection) if there really is only one reference (no matter if transitive or intransitive) to the Companies table. If this is the case, the references to the companies table cannot become inconsistent.

  2. MultiReferenceEntityBase - For special cases such as the Projects table above. Asks all directly referenced entities what company ID they are referencing. Raises an exception if there is an inconsistency. This costs me a few select queries per CRUD operation, but since MultiReferenceEntities are much rarer than SingleReferenceEntities, this is negligible.

Both of these types implement a "CheckReferences" and I am calling it whenever the linq entity is written to the database by partially implementing the OnValidate(System.Data.Linq.ChangeAction action) method which is automatically generated for all Linq entities.

Adrian Grigore