tags:

views:

132

answers:

1

I have my own theories on the best way to do this, but I think its a common topic and I'd be interested in the different methods people use. Here goes

Whats the best way to deal with many-to-many join tables, particularly as far as naming them goes, what to do when you need to add extra information to the relationship, and what to do whene there are multiple relationships between two tables?

Lets say you have two tables, Users and Events and need to store the attendees. So you create EventAttendees table. Then a requirement comes up to store the organisers. Should you

  1. create an EventOrganisers table, so each new relationship is modelled with a join table or
  2. rename EventAttendees to UserEventRelationship (or some other name, like User2Event or UserEventMap or UserToEvent), and an IsAttending column and a IsOrganiser column i.e. You have a single table which you store all relationship info between two attendees or
  3. a bit of both (really?) or
  4. something else entirely?

Thoughts?

A: 

The easy answer to a generic question like this is, as always, "It all depends on the details".

But in general, I try to create fewer tables when this can be done without abusing the data definitions unduly. So in your example, I would probably add an isOrganizer column to the table, or maybe an attendeeType to allow for easy future expansion from audience/organizer to audience/organizer/speaker/caterer or whatever may be needed. Creating an extra table with essentially identical columns, where the table name is in effect a flag identifying the "attendee type", seems to me the wrong way to go both from a pristine design perspective and also from a practical point of view.

A single table is more flexible. With one table and a type field, if we want to know just the organizers -- like when we're sending invitations to a planning meaning -- fine, we write "select userid from userevent where eventid=? and attendeetype='O'". If we want to know everyone who will be there -- like when we're printing name cards for the lunch tables -- we just don't include the attendeetype test.

But suppose we have two tables. Then if we want just the organizers, okay, that's easy, join on the organizer table. But if we want both organizers and audience, then we have to do a union, which makes for more complicated queries and is usually slow. And if you're thinking, What's the big deal doing a union?, note that there may be more to the query. Perhaps a person can have multiple phone numbers and we care about this, so the query is not just joining user and eventAttendee but also phone. Maybe we want to know if they've attended previous conferences because we give special deals to "alumni", so we have to join in eventAttendee a second time, etc etc. A ten-table join with a union can get very messy and confusing to read.

Jay

related questions