views:

47

answers:

2

Hello everyone!

I recently ran into a quite complex problem and after looking around a lot I couldn't find a solution to it. I've found answers to my questions many times before on stackoverflow.com, so I decided to post here.

So I'm making a user/group managment system for a web-based project, and I'm storing all related data into a postgreSQL database. This system relies on three tables:

  1. USERS (Contains the primary key "USER_ID")
  2. GROUPS (Contains the primary key "GROUP_ID")
  3. GROUP_USERS

The two first tables simply define all the users and all the groups on the site, and the last table, GROUP_USERS, stores the groups every user is part of. It only has two columns:

  1. USER_ID
  2. GROUP_ID

Since every user can be a member of several groups, I decided to make a separate table for this purpose, rather than storing a comma separated column in the USERS-table.

Now, both columns are foreign keys, and I want to make them a composite primary key as well, this since each combination of USER_ID and GROUP_ID has to be unique. But now I am stuck with what seems to be a lot of indexes and relations to a very small table only containing numbers. In the end, I want this table to be as fast as possible, even if containing tens of thousands of rows. Size on disk shouldn't be a problem since its just all numbers anyway, but it feels quite stupid to have a full-sized index refering to a smaller table.

Should I stick with my current solution, store comma-separated values in a column in the USERS-table or is there any other solution I should be aware of. What I am looking for is best possible performance. This table could potentially (but not likely or commonly) be queried several hundreds of times on a single page load.

I don't want to use an array-column, even if they are supported by postgreSQL. I want to be as generic as possible so I can switch database later on, if necessary.

EDIT: In other words, will using a composite primary key and two foreign keys in one table with only two columns have a negative impact on performance rather than the opposite due to the size of the generated index?

EDIT2: Clarifications.

Thank you!

A: 

If I understand your question properly, what you might be missing is that Primary Keys (for that matter, Foreign Keys as well) can be what is called Composite, meaning that they contain more than one column... That's what you want here. A composite Primary Key on both UserId and GroupId, and a Foreign Key on each one indivudyally, that each points to (references) the PK in the respective parent table.

Charles Bretana
Well, yes. I'm not entirely sure how to put my question into words, maybe I'm just confused. ;)My question is rather if this solution you just mentioned will have a negative effect on performance rather than the opposite.I will edit the main post for this to be more clear.Thank you for your quick answer!
Emanuel
Every index has somewhat of a negative impact on insert, update and delete performance, as every change in the data requires an additional write IO to update each index, but no matter how many indices you add it can only have a positive affect on Read operations, because if there's an index that can be used to help find the record(s) you need, this will dramatically reduce the number of Read IOs necessary to access the data.
Charles Bretana
+1  A: 

I believe you're in the right path right now, but didn't understand which indexes you really defined.

My suggestion is that you should have your primary key index in USERS by USER_ID, your primary key index in GROUPS by GROUP_ID, and two more indexes in GROUP_USERS. One of the indexes in GROUP_USERS should be either by the couple (USER_ID, GROUP_ID), or by the couple (GROUP_ID, USER_ID). The second index should be by the field that was left in second place in the last index defined.

Now why did I mentioned two options while defining the primary key over GROUP_USERS? That's because there is a slightly performance difference between a primary key index and any other duplicate index. It's very likely that your most common query into that table would be to find out if a user is in a certain group, and that query will perform fast in either way. What you have to consider is which of the following two queries will be more common.

  1. Query which groups a certain user is in.
  2. Query which users are in a certain group.

If 1 is more likely over 2, then your primary key should be (USER_ID, GROUP_ID), otherwise (GROUP_ID, USER_ID).

Fede
I have set up the tables in the way you suggested. I believe that most likely I will check to see if a user is in a given group or not rather than the opposite.Your response make me feel more confident that I'm doing it the right way now, so I am considering this question answered.Thank you both again.
Emanuel