views:

111

answers:

2

Hi everyone

While I've been writing php for a long time, it was always a skill I learnt myself and I've just hit a minor crisis of confidence over table joins!

Assuming a MySQL db auth containing MyISAM tables users and permissions:

users
- id (auto increment)
- email

permissions
- id (auto increment)
- name

To join these tables in a many-to-many (or one-to-many), I've always used bridge tables like so:

user_permissions
- id (auto increment)
- user_id
- permission_id

(I know innoDB is capable of relationships, but it's also more complex and hogs more memory, so for the purpose of the q I'd like to stay with myISAM)

My particular question is this: is it wise to join the tables using the auto-incremented key, or should I be generating my own additional key?

I'm aware that problems could occur if a table gets corrupted and I have to rebuild or if I begin mirroring to two dbs and the keys get out-of-sync.

I'm also aware that if I generate a unique hash for each row (to be used in joins) then there is the overhead of generating a hash and checking that it is new before every data insert.

How does everyone else do it? Are these issues things you have seen in practical situations?

Thanks for your time!

Adam

A: 

The join for permission and users would go like:

SELECT      u.*, p.*
FROM        user_permissions up
INNER JOIN  users            u
ON          up.user_id = u.id
INNER JOIN  permissions      p
ON          up.permission_id = p.id

The id column in user_permission is not really necessary, unless you want to refer to this from another table. If you leave that id out, (user_id, permission_id) would be the primary key (no auto incrementing going on in this case). If you have he separate id column then you should definitely add UNIQUE constraint over (user_id, permission_id)

Roland Bouman
Thanks, I know how to do the SQL - the question is more about the safe use of auto_increments in the join.
adam
I guess I don't understand the problem. What do you mean "safe"? What does auto incrementing have to do with joining?
Roland Bouman
As per the question, if the database should corrupt or a mirror should fail, do I risk losing the relationship if the relationship is built on auto_incremented values?
adam
You could say that if you'd ve used natural keys for user (say username), permission (say permissionverb) and user_permission (username,permissionverb) then if you'd lose either the user or permission tables, you can still recreate (part of)them by looking what is in user_permission. Is that your question?
Roland Bouman
A: 

This is generally a decision between using surrogate keys (made-up or automatically assigned values that have no meaning beyond being unique), and natural keys (keys that have some semantic meaning with respect to the entity being stored, like a name, or bank account #).

in your case, it's not clear there is any natural key that would work (every aspect of the user table could change - name change after marriage, etc.). you would want a natural key if there's no possibility of the key ever morphing while still keeping the relationships. an example would be something like element names (Au, He, Es, etc.) in the periodic table of elements (which are unlikely to change aside from adding new ones, but hey, anything could happen...).

as for data corruption, backups are really the best protection because anything can get corrupted. and in typical operations, you can always preserve the primary key whenever migrating or synchronizing. using an elaborately complex surrogate key instead of the auto increment provided by the db would be more risky because it complicates inserts (you have to ensure its unique, and possibly handle collisions in a transaction safe way)... and indeed an elaborately generated surrogate key wouldn't help in the case of data corruption (no way to correlate it with the record).

a natural PK on say the user's email would be a lot of overhead - have to update every relationship whenever the email changes, swapping email addresses between 2 accounts is complicated, indexes are much less efficient on wide values instead of ints. the trade-offs tilt in favor of the auto increment key.

a good writeup is here:

http://decipherinfosys.wordpress.com/2007/02/01/surrogate-keys-vs-natural-keys-for-primary-key/

jspcal