ansaurus

Question

Arrays in database tables and normalization

Answer 1

A:

The problem with the first implementation is that it doesn't actually use an array but a concatenated string.

This means that you won't easily be able to use the value stored in that string to perform set based queries such as finding all people with a specific permission or specific set of permissions.

If you were using a database that natively supported arrays as an atomic value such PostgreSQL then the argument would be different.

Based upon the second requirement of the proposed query I'd have to suggest the second one is best as you can simply query SELECT count(*) FROM Permissions WHERE CategoryID = 42 AND GroupID IN (40, 2, 42) AND PermissionID = 2 (assuming EatAndDrink has an ID of 2). The first version however would require retrieving all the permissions for each group and parsing the string before you can test if it includes the requested permission.

KeeperOfTheSoul 2010-05-24 14:21:07

Forgot to mention, in the scope of this concrete question we will consider that the "fetch rows that have permission X" won't be performed, instead all the lookups will be made by GroupID and CategoryID only

Ivan Petrov 2010-05-24 14:23:03

For now, until you're asked to produce a report of all users that have permission X in the system at some point.

KeeperOfTheSoul 2010-05-24 14:35:34

I am almost convinced:) However "GroupID IN (40, 2, 42)" requires dynamic SQL in MSSQL or whatever else they suggest at http://www.sommarskog.se/arrays-in-sql-2005.html.

Ivan Petrov 2010-05-24 15:06:24

Answer 2

A:

Your second example should probably be:

constraint PK_GroupCategory primary key clustered(GroupID,CategoryID,PermissionID)

Your first example would violate normal form (and string parsing might not be a good use of your processing time), but that doesn't mean it's necessarily wrong for your application. It really depends how you use the data.

Cade Roux 2010-05-24 14:37:29

Thanks! I have updated the question with a usage scenario

Ivan Petrov 2010-05-24 14:43:02

@Ivan Petrov I would say this needs to be normalized with the PermissionID (it looks like just a presence indicates GRANT, but typically you might have more permissions on an object) as in your table schema. Deep down, you know it's right.

Cade Roux 2010-05-24 14:56:17

Answer 3

A:

Is it smart

Occasionally, it depends. I'd say it depends how narrowly you define the things being normalised.

If you can see no way in which a table with one row for each item would ever be useful then I'd suggest that the encapsulate-in-a-string might be considered.

In the example given, I'd want to be sure that executing a query to find all group/category combinations for a specified permission would not cause me a problem if I had to write a WHERE clause that used string pattern matching. Of course, if I never have to perform such a query then it's a moot point.

In general I'm happiest with this approach when the data being assembled thus has no significance in isolation: the data only makes sense when considered as a complete set. If there's a little more structure, say a list of data/value pairs, then formatting with XML or JSON can be useful.

Mike Woodhouse 2010-05-24 14:41:04

I envision Permissions as a comma-delimited string since MSSQL - our primary deployment target - does not support arrays and it actually is my great personal debate of whether it needs to, thus this question :)

Ivan Petrov 2010-05-24 14:45:02

Answer 4

A:

If you're only querying by GroupID and/or CategoryID then there's nothing wrong with it. Normalizing would mean more tables, rows, and joins. So for large databases this can have a negative performance impact.

If you're absolutely certain you'll never need a query which processes Permissions, and it's only parsed by your application, there's nothing improper about this solution. It could also be preferable if you always want the complete set of permissions (i.e. you're not querying just to get part of the string, but always want all of its values).

Matt S 2010-05-24 14:45:21

Good point, given I want them to be flexible can't I ensure integrity with triggers? It looks like too much hassle for me though...

Ivan Petrov 2010-05-24 14:49:58

Well, by using a string you're basically choosing to let the application completely handle the data. If you're worried about its integrity I would normalize the data and go with the second solution (PermissionIDs instead of the string).

Matt S 2010-05-24 15:05:10

Answer 5

+1 A:

I'd suggest to take the normalized road for the following reasons:

By having a table containing all possible permissions, you have self-documenting data. You may add a description to each permission. This definitely beats concatenated id values without any meaning.
You get all the advantages of referential integrity and can be sure that there are no bogus permission ids in your data.
Inserting and deleting permissions will be easier - you add or delete records. With the concatenated string you will be updating a column, and delete the record only when you remove the last permission.
Your design is future-proof - you say you only want to query by CategoryID and GroupID, you can do this already with normalized tables. On top of that, you will also for example be able to add other properties to your permissions, query by permission, etc.
Performance-wise, I think it will actually be faster to get a resultset of id's than having to parse a string to integers. To be measured with actual data and implementation...

marapet 2010-05-24 17:11:36

That makes sense, many thanks for the input!

Ivan Petrov 2010-05-24 18:58:19

ansaurus

tags:

views:

answers:

Arrays in database tables and normalization

related questions