views:

114

answers:

5

Hi!

Is it smart to keep arrays in table columns? More precisely I am thinking of the following schema which to my understanding violates normalization:


create table Permissions(
    GroupID int not null default(-1),
    CategoryID int not null default(-1),
    Permissions varchar(max) not null default(''),
    constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);

and this:


create table Permissions(
    GroupID int not null default(-1),
    CategoryID int not null default(-1),
    PermissionID int not null default(-1),
    constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);

UPD3: I envision Permissions as a comma-delimited string since MSSQL is our primary deployment target.

UPD: Forgot to mention, in the scope of this concrete question we will consider that the "fetch rows that have permission X" won't be performed, instead all the lookups will be made by GroupID and CategoryID only

UPD2: I envision the typical usage scenario as following:


int category_id=42;
int[] array_of_groups=new int[]{40,2,42};
if(!Permissions.Check(category_id, array_of_groups, Permission.EatAndDrink)) {
    throw new StarveToDeathException();
}

Thoughts?

Thanks in advance!

A: 

The problem with the first implementation is that it doesn't actually use an array but a concatenated string.

This means that you won't easily be able to use the value stored in that string to perform set based queries such as finding all people with a specific permission or specific set of permissions.

If you were using a database that natively supported arrays as an atomic value such PostgreSQL then the argument would be different.

Based upon the second requirement of the proposed query I'd have to suggest the second one is best as you can simply query SELECT count(*) FROM Permissions WHERE CategoryID = 42 AND GroupID IN (40, 2, 42) AND PermissionID = 2 (assuming EatAndDrink has an ID of 2). The first version however would require retrieving all the permissions for each group and parsing the string before you can test if it includes the requested permission.

KeeperOfTheSoul
Forgot to mention, in the scope of this concrete question we will consider that the "fetch rows that have permission X" won't be performed, instead all the lookups will be made by GroupID and CategoryID only
Ivan Petrov
For now, until you're asked to produce a report of all users that have permission X in the system at some point.
KeeperOfTheSoul
I am almost convinced:) However "GroupID IN (40, 2, 42)" requires dynamic SQL in MSSQL or whatever else they suggest at http://www.sommarskog.se/arrays-in-sql-2005.html.
Ivan Petrov
A: 

Your second example should probably be:

constraint PK_GroupCategory primary key clustered(GroupID,CategoryID,PermissionID)

Your first example would violate normal form (and string parsing might not be a good use of your processing time), but that doesn't mean it's necessarily wrong for your application. It really depends how you use the data.

Cade Roux
Thanks! I have updated the question with a usage scenario
Ivan Petrov
@Ivan Petrov I would say this needs to be normalized with the PermissionID (it looks like just a presence indicates GRANT, but typically you might have more permissions on an object) as in your table schema. Deep down, you know it's right.
Cade Roux
A: 

Is it smart

Occasionally, it depends. I'd say it depends how narrowly you define the things being normalised.

If you can see no way in which a table with one row for each item would ever be useful then I'd suggest that the encapsulate-in-a-string might be considered.

In the example given, I'd want to be sure that executing a query to find all group/category combinations for a specified permission would not cause me a problem if I had to write a WHERE clause that used string pattern matching. Of course, if I never have to perform such a query then it's a moot point.

In general I'm happiest with this approach when the data being assembled thus has no significance in isolation: the data only makes sense when considered as a complete set. If there's a little more structure, say a list of data/value pairs, then formatting with XML or JSON can be useful.

Mike Woodhouse
I envision Permissions as a comma-delimited string since MSSQL - our primary deployment target - does not support arrays and it actually is my great personal debate of whether it needs to, thus this question :)
Ivan Petrov
A: 

If you're only querying by GroupID and/or CategoryID then there's nothing wrong with it. Normalizing would mean more tables, rows, and joins. So for large databases this can have a negative performance impact.

If you're absolutely certain you'll never need a query which processes Permissions, and it's only parsed by your application, there's nothing improper about this solution. It could also be preferable if you always want the complete set of permissions (i.e. you're not querying just to get part of the string, but always want all of its values).

Matt S
Good point, given I want them to be flexible can't I ensure integrity with triggers? It looks like too much hassle for me though...
Ivan Petrov
Well, by using a string you're basically choosing to let the application completely handle the data. If you're worried about its integrity I would normalize the data and go with the second solution (PermissionIDs instead of the string).
Matt S
+1  A: 

I'd suggest to take the normalized road for the following reasons:

  • By having a table containing all possible permissions, you have self-documenting data. You may add a description to each permission. This definitely beats concatenated id values without any meaning.
  • You get all the advantages of referential integrity and can be sure that there are no bogus permission ids in your data.
  • Inserting and deleting permissions will be easier - you add or delete records. With the concatenated string you will be updating a column, and delete the record only when you remove the last permission.
  • Your design is future-proof - you say you only want to query by CategoryID and GroupID, you can do this already with normalized tables. On top of that, you will also for example be able to add other properties to your permissions, query by permission, etc.
  • Performance-wise, I think it will actually be faster to get a resultset of id's than having to parse a string to integers. To be measured with actual data and implementation...
marapet
That makes sense, many thanks for the input!
Ivan Petrov