+1  Q: 

redundant column

Hi there, I have a database that has two tables, these tables look like this

id | code | member_id
1  | 123  | 2
2  | 234  | 1
3  | 345  |  
4  | 456  | 3

id | code_id | other info
1  | 2       | blabla
2  | 1       | blabla
3  | 4       | blabla

the basic idea is that if a code is taken then its member id field is filled in, however this is creating a circle link (members points to codes, codes points to members) is there a different way of doing this? is this actually a bad thing?

To answer your questions there are three different code tables with approx 3.5 million codes each, each table is searched depending on different criteria, if the member_id column is empty then the code is unclaimed, else, the code is claimed, this is done so that when we are searching the database we do not need to include another table to tell if it it claimed.

the members table contains the claimants for every single code, so all 10.5 million members the additional info has things like mobile, flybuys.

the mobile is how we identify the member, but each entry is considered a different member.


You could simply drop the member_id column and use a foreign key relationship (or its absence) to signify the relationship or lack thereof. The code_id column would then be used as a foreign key to the code. Personally, I do think it's bad simply because it makes it more work to ensure that you don't have corrupt relationships in the DB -- i.e., you have to check that the two columns are synchronized between the tables -- and it doesn't really add anything in the general case. If you are running into performance problems, then you may need to denormalize, but I'd wait until it was definitely a problem (and you'd likely replicate more than just the id in that case).

+1  A: 

Yeah, this is not good because it presents opportunities for data integrity problems. You've got a one-to-one relationship, so either remove Code_id from the members table, or member_id from the codes table. (in this case it seems like it would make more sense to drop code_id from members since it sounds like you're more frequently going to be querying codes to see which are not assigned than querying members to see which have no code, but you can make that call)


It depends on what you're doing. If each member always gets exactly one unique code then just put the actual code in the member table.

If there are a set of codes and several members share a code (but each member still has just one) then remove the member_id from the codes table and only store the unique codes. Access a specific code through a member. (you can still join the code table to search on codes)

If a member can have multiple codes then remove the code_id from the member table and the member_id from the code table can create a third table that relates members to codes. Each record in the member table should be a unique record and each record in the code table should be a unique record.


What is the logic behind having the member code in the code table?

It's unnecessary since you can always just do a join if you need both pieces of information.

By having it there you create the potential for integrity issues since you need to update BOTH tables whenever an update is made.


Yes this is a bad idea. Never set up a database to have circular references if you can help it. Now any change has to be made both places and if one place is missed, you have a severe data integrity problem.

First question can each code be assigned to more than one member? Or can each member have more than one code? (this includes over time as well as at any one moment if you need historical records of who had what code when))If the answer to either is yes, then your current structure cannot work. If the answer to both is no, why do you need two tables?

If you can have mulitple codes and multiple members you need a bridging table that has memberid and code id. If you can have multiple members assigned one code, put the code id in the members table. If it is the other way it should be the memberid in the code table. Then properly set up the foreign key relationship.

"If the answer to both is no, why do you need two tables?" - In his use case there are some members with no codes, and some codes with no members, so having two tables is valid.
+3  A: 

It's a bad thing because you can end up with anomalies. For example:

id | code | member_id
1  | 123  | 2

id | code_id | other info
2  | 4       | blabla

See the anomaly? Code 1 references its corresponding member, but that member doesn't reference the same code in return. The problem with anomalies is you can't tell which one is the correct, intended reference and which one is a mistake.

Eliminating redundant columns reduces the chance for anomalies. This is a simple process that follows a few very well defined rules, called rules of normalization.

In your example, I would drop the codes.member_id column. I infer that a member must reference a code, but a code does not necessarily reference a member. So I would make members.code_id reference But it could go the other way; you don't give enough information for the reader to be sure (as @OMG Ponies commented).

Bill Karwin

@Bill Karwin correctly identifies this as a probably design flaw which will lead to anomalies.

Assuming code and member are distinct entities, I would create a thrid table...

What is the relationship between a code and member called? An oath? If this is a real life relationship, someone with domain knowledge in the business will be able to give it a name. If not look for further design flaws:

code_id | member_id
1       | 2
2       | 1
4       | 3

The data suggest that a unique constraint is required for (code_id, member_id).

Once the data is 'scrubbed', drop the columns codes.member_id and members.code_id.