views:

137

answers:

6

I am designing a database and was thinking about the need for a one to many relationship. Traditionally I have done the normal PK (as a GUID) and set up the relationship, but I was wondering instead if doing that why not use a bitwise flag as the PK.

The relationship would be lost but the data itself would describe the relationship.

Example - I have a table of groups and a table of users. Users can have 1 or more groups:

+------------------------+
| Groups                 |
+------------------------+
| PK      | Display Name |
+---------+--------------+
| 1       | Group A      |
| 2       | Group B      |
| 4       | Group C      |
+---------+--------------+

+------------------------+
| Users                  |
+------------------------+
| Name    | Groups       |
+---------+--------------+
| Fred    | 1            | // Fred is only in Group A
| Jim     | 3            | // Jim is in Groups A & B
| Sam     | 7            | // Sam is in all Groups
+---------+--------------+

Thoughts, comments and suggestions on this design please?

+2  A: 

Nice idea, can't see how it would benefit from being done in a relational database though. I think if you are in SQL you pretty much have to use standard keys, otherwise what is the point of the expensive storage you have?

Perhaps it would be better suited to file based storage where each table is in a different file. Or, have it as an additional column but don't put a primary key on it.

P.S. Wouldn't it fall over if two users are in the same groups?

Ryan

Ryan ONeill
The group membership isn't the PK, so it should be ok if they're in the same groups.
GalacticCowboy
+5  A: 

I'd discourage using bit flags like this. For one thing, you've broken the ability to easily join these tables, so determining group membership will a) take longer, b) be more difficult, and c) probably involve more full-table scans or at least index scans.

GalacticCowboy
+4  A: 

You're going to run out of numbers quite quickly. If you have 64 groups you're already using 64 bits. I shudder to think what would happen if you have a million groups.

The other problem with this is that, if you delete a group, you've lost a bit. You can reuse that bit later but that might not be the way you want to go.

rein
+2  A: 

Not good IMO. This is typical many-to-many relationship. If PK is 32 bit than one user can be in max 32 groups. Why limit the design ?

// Sam is in all Groups.

Lets say you modify design and make PK 4 bits insted of 3. Is now Sam in "all groups" or only in groups 0-7 ?

What's the gain ?

How would you write queries (joins) ? I think you will have problems with this design.

Petar Repac
A: 

I've seen GUIDs used as primary keys quite a bit recently and I think it's a horrible idea. GUID stands for "Globally Unique ID", which is to say, it's an identifier whose chance of binging replicated is, to misquote Brockmeyer in his OLE book "as likely as a bunch of atoms all rushing together in empty space to form a small walnut".

There are good reasons to use GUIDs, if you actually need something globally unique, but most database keys only need to be unique relative to the database in question. In this case, an sequentially-generated integer key is often sufficient, and it consumes a lot less in the way of resources. Actually, neither sequence IDs or GUIDs have any inherent meaning, so it's really better if you can to use items that actually truly do identify the object in question as the primary key, if you can (although there's a school of thought that says all items should have a content-independent key such as sequence or even GUID).

Bitfields have several liabilities as primary keys. As was mentioned, you may decide you need additional bits. You may end up with collisions. Databases tend to be poor in bitwise functionalities and non-portable even then. And finally, the indexing algorithms for your database of choice may not be capable of optimizing bitset keys well.

Tim H
One advantage of a GUID as the PK is that the client code can generate a key (with a reasonable expectation of uniqueness) instead of going back to the database to find out what the key will be. If you're doing disconnected editing, this is particularly advantageous. On the other hand, you definitely want to treat them with care. For example, do NOT make a GUID the clustered index/key on your table, as the order of the data will essentially be random and most inserts will fall somewhere in the middle of the table and require data to be moved.
GalacticCowboy
+1  A: 

bad idea... how would the index work at all? it would have to scan and do a calculation on every value...

Lets say you wanted to know all users in group X... you'd have to run a function on every single row to determine whether or not they were in that group, rather than just doing a very quick index seek to find your answer.

Edit: Simply... write me a query that uses an index on your table to find all users in group B... no matter what, you will be doing a calculation, which will force it to do a late-filtered scan

Brian Rudolph