In my Rails application, I have a variety of database tables that contain user data. Some of these tables have a lot of rows (as many as 500,000 rows per user in some cases) and are queried frequently. Whenever I query any table for anything, the user_id of the current user is somewhere in the query - either directly, if the table has a direct relation with the user, or through a join, if they are related through some other tables.
Should I denormalize the user_id and include it in every table, for faster performance?
Here's one example:
- Address belongs to user, and has a user_id
- Envelope belongs to user, and has a user_id
- AddressesEnvelopes joins an Address and an Envelope, so it has envelope_id and address_id -- it doesn't have user_id, but could get to it through either the envelope or the address (which must belong to the same user).
One common expensive query is to select all the AddressesEnvelopes for a particular user, which I could accomplish by joining with either Address or Envelope, even though I don't need anything from those tables. Or I could just duplicate the user id in this table.
Here's a different scenario:
- Letter belongs to user, and has a user_id
- Recepient belongs to Letter, and has a letter_id
- RecepientOption belongs to Recepient, and has a recepient_id
Would it make sense to duplicate the user_id in both Recepient and RecepientOption, even though I could always get to it by going up through the associations, through Letter?
Some notes:
- There are never any objects that are shared between users. An entire hierarchy of related objects always belongs to the same user.
- The user owner of objects never changes.
- Database performance is important because it's a data intensive application. There are many queries and many tables.
So should I include user_id in every table so I can use it when creating indexes? Or would that be bad design?