views:

127

answers:

8

This is in mysql, but its a database design issue. If you have a one to many relationship, like a bank customer to bank-accounts, typically you would have the table that records the bank-account information have a foreign key that keeps track of the relationship between account and customer. Now this follows the 3rd normal form thing and is a widely accepted way of doing it.

Now lets say that you are going to limit a user to only having 3 accounts. The current database implementation will support this and nothing would need to change. But another way to do this would have 3 coloms in the account table that have the id of the 3 respective accounts in them. By the way this violates 1st normal form of db design.

The question is what would be the advantage and disadvantages of having the user account relationship recored in this way over the traditional?

Update

Unfortunately I am not in-charge of the db design. When I saw a similar relationship implemented in our db, I asked my boss, the db designer, why he choose to do it that way. I didn't really get a strait answer, or an answer I understood to have logical reasons backing it up. "This is a very common relationship when you work with databases allot, and that is just how you do it." I asked for more clarification. ... Didn't get me anywhere and made him defensive.

Thank you all so much for these post, I couldn't really find any books that even talked about doing something like this. I found lots of books tell me to do it the right way but not many giving an example this broken and then explaining why it will suck.

+1  A: 

Advantage: greater performance, no joins to relationship-table needed

Disadvantage: violates 1st normal form (but may be violated for grater performance)

it's up to you ;)

Tobias P.
+5  A: 

Well first, you will have lots of blank fields for records for customers that have less than three accounts.

Adding a fourth or more accounts will require adding columns to the table, which again will result in more blank fields for each record.

Secondly, it'll be easier to query the data (for things like the total number of accounts) if they are stored in a separate table.

The reason we use separate tables for 1 to N relationships is that it will save you headaches like these down the line.

Evernoob
+3  A: 

Advantages:

  • Faster than normal form (by how much?)
  • Simpler queries for basic operations (no joins)
  • Slightly easier to impose maximum limits

Disadvantages:

  • Extensibility
  • Added business logic (What if a customer closes their first account? Shift the others?)
  • Wasted space (considerable if the average user doesn't have 3 accounts)
  • Aggregate metrics more difficult to obtain (i.e.: exact total number of accounts)
  • You can't claim your DB is normalized

Both options are valid given the requirements. If possible, benchmark the difference and see if the performance, and calculate the storage difference to see if it's worth it for your deployment.

However, I would likely opt to use a trigger to impose the account limit since that would provide the easiest maintainability, not waste disk space and future developers won't wonder why I can't even get 1NF right.

Ben S
I don't think the queries would be simpler. You will have a bunch of (account1 = blah OR account2 = blah or account3 = blah) all over the place, which is not simple. Plus, you will need to join to the account table anyway, because you are just denormalizing the account ID, not all the account info. But now your join will involve the ugly OR statement above.
Peter Recore
See, I would implement that logic in the host language, not SQL. So I would simply do `SELECT * FROM accounts WHERE id=127632` and use the host language to figure out which column needs work. This is why I have the "Added business logic" disadvantage.
Ben S
Yes, a trigger is the correct way to implement the limit onthe number of accounts and so easy to change if they decide 4 is the max.
HLGEM
-1: Can't easily compute sums of rows because you have to factor in the 3 copies across columns (and associated nulls).
S.Lott
@S.Lott: That's exactly what I said. Under Disadvantages: "Aggregate metrics more difficult to obtain (i.e.: exact total number of accounts)"
Ben S
@Ben S: It's so huge a disadvantage that it trumps the other disadvantages. Indeed, it appears to be a complete stumbling block. It seems to invalidate the approach entirely. I think "disadvantage" is too weak a statement.
S.Lott
+1  A: 

Now lets say that you are going to limit a user to only having 3 accounts.

This involves a magic number that is subject to change.

Don't believe anyone who says there's a "limit". Today's absolute maximum is tomorrow's minimum.

Do not participate in enforcing this silliness in the database. All "limits" are nothing more than "typical values for the current application" and will change.

Use normal 1-to-many with normal foreign keys and ignore the magic number "3".

S.Lott
Enforcing the business rule in the database is preferable to enforcing it in every application that accesses it. Change is normal and expected. If it's in the database you only have to change it once. If it's in the app then you have to change it for every application.
dportas
@David: "If it's in the database you only have to change it once" I prefer to have it change **never** in the database. The magic number "3" is not a sensible business "rule". It's just a random magic number. It can't be significant; it will change randomly through the life of the project. It is the absolute worst kind of business rule in all layers of the architecture, and particularly costly and complex in the database for absolutely no value.
S.Lott
@Lott, The key is to manage change, not to avoid it. If you are trying to prevent change then you are trying to prevent progress. So assuming it is a legitimate requirement that the business needs to implement today (there may be legal or regulatory reasons for it for example) then the ideal place to do it is in the database.
dportas
@David: I'm not preventing anything. I'm suggesting that the application rules ("3") do not belong in the database so that they **can** change every time someone makes up a new random magic number. The database is never the ideal place for artificial, farcical, random constraints based on magic numbers. When someone changes this for 4, a normalized database has **Zero** changes, enhancing and supporting the application's change from "3" to "4". Trying to enforce this in the database prevents application change.
S.Lott
@David: But there are ways to enforce it in the database other than with such a denormalization, such as triggers. In practice, I rarely work with databases where more than one app can update the same table. Usually there's a "main app" and "other read-only apps". Of course that's not inevitable, I don't doubt there are many counter-examples out there. But it means that in practice I routinely enforce business rules in code and don't worry about the theoretical problems. Arguably short-sighted.
Jay
I agree with Jay: the rule can and should be enforced without resorting to having multiple account number columns (using a constraint for example). But the decision on whether it *should* be implemented ought to rest with the business owner, not with the database designer.
dportas
@David: "not with the database designer" and -- therefore -- not in the database. Adding and changing constraints is just friction -- it turns energy into useless heat.
S.Lott
@Lott, so if the business requires a rule to be enforced, you would just say No? Or do you have a more constructive suggestion. Saying No is what I mean by preventing change. You are telling the customer it can't be done.
dportas
@David: "rule"? All rules can be enforced in the application. That's first. A silly random rule like "3" creates problems because it's inappropriate in the database -- it's hard to manage and subject to random change. A Relational Modeling rule (like an FK constraint) is a **structural** part of the database. It's not a business "rule". It's part of the definition of the problem domain. "business rule" is vague. Is it a silly, random number? Then the answer is "do it in your application code." Not "No". But "do it outside the database" where it's easier to change.
S.Lott
@Lott, I refer you to my first answer. If you have to do it in every application then it is generally much harder to change because the surface area is so much bigger. Also you run the risk that some applications may not implement the rule correctly. If you do it in the database you only do it once and it's guaranteed for every application. Plus it only takes a few seconds to change a database constraint whereas releasing a change for every app could take much longer.
dportas
@David: "it only takes a few seconds to change a database constraint". Not the denormalized example. In the denormalized example would take forever to change from "3" to "4" because it requires reworking all kinds of SQL. This example shows that the constraint of "3" is a terribly expensive and complex constraint. Most constraints are like that. The only exception are structural relationships like FK's.
S.Lott
@Lott, I totally agree that the example with three to four columns is bad for the reasons you say. I would not recommend that approach. Use a single column for account and then implement the 3 account rule in the database as a constraint or in procedural code. I'm suggesting that the best place to do that is the database, not the application code.
dportas
@David: "implement the 3 account rule in the database as a constraint" Bad. "procedural code". Correct. It's random with no "meaning". In the database is **always** more costly and more complex than in the application. It's random noise. Your "some applications may not implement the rule correctly" isn't relevant because the rule has no meaning. It's just a random number than changes for random reasons. This kind of dumb, small, random constraint creates useless make-work, friction that stifles business change. This kind of constraint is bad to begin with. Implementing is bad.
S.Lott
@Lott, It is more than a "random" number if it is the policy of your business. I repeat: If your business customer decides this is the rule they need to implement and that their customers MUST be prevented from creating more than 3 accounts then what would be your answer? You would implement it in the applications? But you've already agreed that it is likely to change and therefore you will be forced to make application changes when it does. If you did it in the database then changing the constraint would be a much smaller piece of work. All I'm saying is that you ought to consider that cost.
dportas
@David: "You would implement it in the applications" Correct. "forced to make application changes when it does" Correct. That's my position. After spending 10 years as a DBA, that's my position. I don't *hate* the database. I find that an overconstrained database is expensive and complex. I'm not preventing or refusing change. I'm simply saying this: Don't Do It In The Database Unless It's Part Of The Structure Of The Problem. "Policy" doesn't mean anything. It comes and goes.
S.Lott
@Lott: No problem if that's your experience. But for many installations that would be a prohibitively expensive option. The OP needs to weigh up the potential costs of changing every application versus changing the database once. Changing the database may be as easy as one line of code. Changing the apps is unlikely to be that simple.
dportas
@David: "Changing the database may be as easy as one line of code". Only when it's empty. Changing the app is generally trivial, since it doesn't invalidate numerous rows of data.
S.Lott
@Lott, Depends on the change. Changing from a limit of 3 accounts to a limit of 4 would not cause a violation for any existing rows. Changing from 4 to 3 most likely would - although perhaps doing something with those other rows would be part of the planned change anyway, whether you did it in the app or the database. So, as so often, the answer is "it depends". Don't assume the database change costs more than the application change. In many cases it will be cheaper.
dportas
@David: "Depends on the change". This question provides a specific scenario. "3" a random denormalization that makes SQL queries awful and requires extensive rework to change. There's little "depends" about this specific question. If you want to talk about other scenarios, pose another question.
S.Lott
@Lott, I already said I wasn't advocating the structure with 3 columns. I am talking about the requirement described in this question but I definitely agree it is not a good idea to use three account columnS.
dportas
I'd also note: Some rules are inherent in the nature of the universe, or at least so well-settled that they are unlikely to ever change. Like, in the U.S. we have 2 senators per state, and changing that would require a very unlikely constitutional amendment. I wouldn't be too concerned about a program that embedded a 2-senator per state limit. At the other extreme, though, I once worked on an HR system that allowed for 6 children per employee on the group health insurance. Can you guarantee that you will never have an employee with more than 6 children? Ever?
Jay
@Jay: Good point. The US presidents is a great example of seductively bad database design decisions. It seems like "name", "party", "start" and "number of terms" would cover a lot of bases. Except some US presidents didn't serve consecutive terms, some changed parties, some died, some where assassinated. The "inherent" issues are few. The "random constraints" are many. Constraints are an attractive nuisance. And as this question shows, a source of poor optimization considerations.
S.Lott
+1  A: 

The 3 column approach suffers if there's a business rule change (IE: users can now have FOUR accounts, ah ah ah...). This would require an ALTER table statement vs INSERT, and all supporting logic would have to be revisited to accommodate the new column(s) -- very expensive from a development perspective. Additionally, databases do have a column limit, when 3NF wouldn't be affected by.

OMG Ponies
+4  A: 

The biggest problem is that your queries become more complicated. Say you want to find all accounts with balance over $10,000 with the owner. In a normalized DB, that would be something like:

select firstname, lastname, accountnumber, balance
from account
join customeraccount using (accountnumber)
join customer using (customernumber)
where balance>10000

But with three accountnumber fields, it becomes

select firstname, latname, accountnumber, balance
from account
join customer on customer.accountnumber1=account.accountnumber
  or customer.accountnumber2=account.accountnumber
  or customer.accountnumber3=account.accountnumber
where balance>10000

Every query that joins Account to Customer now gets more complicated like this.

Sooner or later, someone will write a query that fails to check accountnumber3, or he tries to make the three tests by copy-and-paste and after copying accountnumber1 two times he forgets to change one of them. It's an error that's easy to overlook when reading a query. If you mess up one of the three compares, the program will work for all customers who have only two accounts but fail for customers who have three. This is the sort of problem that can easily slip through testing.

You now have to think through exactly how the joins work when the same customer has multiple accounts. If a customer has two qualifying accounts in some query, do you want him to show up once or twice?

You probably need to index on the account number field in customer. Now you need three indexes instead of one. More overhead for the database.

Are you sure that the maximum will never ever change? Because if it ever does, now every one of those queries that check the three slots will have to be changed to check four slots. That could be a ton of work.

What do you gain in exchange for all this pain? Automatic enforcement of the max-3 limit. One less table. You might possibly get better performance on some queries because there is one less table to be joined. Then again you might not get better performance, depending on many details of the inner workings of the database engine and the actual data.

All told, I would say it is almost certainly not worth doing. Stick with the normalized database.

I speak from experience. I did something very similar to this once. We had a database where we had to record three types of "managers" for each book our organization published (#1 in charge of budgeting and administration, #2 in charge of distribution, and #3 in charge of contents (i.e. the editor). As the three were different, I created three separate points. Huge mistake. I would have been much better off to create a book-manager table with a type code and enforce only one of each type with triggers or code. The queries would have been way simpler. (Experience enables you to make good decisions. Experience is gained by making bad decisions.)

Jay
A: 

Possible disadvantages of splitting account numbers over multiple columns:

You would have to repeat query logic and any other code that references account number.

You would have to use some placeholder instead of an account number if there were fewer than three.

It's not automatically a violation of First Normal Form because 1NF is nothing more than the definition of a relation itself. It's just not a very practical design (DRY principle).

dportas
I believe the definition of 1NF is: (a) Has a primary key, and (b) No repeating groups. This violates (b).
Jay
@Jay, 1NF (in the SQL context for example) just means that a notionally relational table accurately represents a relation: it has at least one candidate key, has unique column names and no nulls."Repeating group" is actually a term derived from Codasyl. It refers to a concept which does not exist in SQL or in the relational model - specifically, arrays of values within a table. So repeating group isn't relevant here. There's no fundamental reason why a relation shouldn't have multiple attributes that represent the same sort of thing. It's just not usually a very useful thing to do.
dportas
http://www.simple-talk.com/sql/learn-sql-server/facts-and-fallacies-about-first-normal-form/
dportas
@David: Hmm. I don't want to argue about definitions of words, but: The Ultimate Arbiter of All Knowledge -- Wikipedia -- says that 1NF is "the table is a faithful representation of a relation and that it is free of repeating groups." IBM says, "An entity is in the first normal form if it contains no repeating groups." (http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.ddi.doc/ddi53.htm) According to the book "A Sane Approach to Database Design", page 147, "To be in First Normal Form, a table must (a) Have a primary key; and (b) Have no repeating groups."
Jay
@Jay, To be clear, I totally agree that a relation in 1NF cannot possibly have repeating groups. However, I don't think a repeating group is what's under discussion here. Multiple attributes representing different accounts is not a repeating group.
dportas
A: 

sounds like your all drowning in a glass of water!.. why not just add a counter column in each detail account table and if you try to add a new account row, check the counter?

Frank Computer