views:

1254

answers:

11

In our dev group we have a raging debate regarding the naming convention for Primary and Foreign Keys. There's basically two schools of thought in our group:

1) Primary Table (Employee) Primary Key is called ID

Foreign table (Event) Foreign key is called EmployeeID

2) Primary Table (Employee) Primary Key is called EmployeeID

Foreign table (Event) Foreign key is called EmployeeID

I prefer not to duplicate the name of the table in any of the columns (So I prefer option 1 above). Conceptually, it is consisted with a lot of the recommended practices in other languages, where you don't use the name of the object in its property names. I think that naming the foreign key EmployeeID (or Employee_ID might be better) tells the reader that it is the ID column of the Employee Table.

Some others prefer option 2 where you name the primary key prefixed with the table name so that the column name is the same throughout the database. I see that point, but you now can not visually distinguish a primary key from a foreign key. Also, I think it's redundant to have the table name in the column name, because if you think of the table as an entity and a column as a property or attribute of that entity, you think of it as the ID attribute of the Employee, not the EmployeeID attribute of an employee. I don't go an ask my coworker what his PersonAge or PersonGender is. I ask him what his Age is.

So like I said, it's a raging debate and we go on and on and on about it. I'm interested to get some new perspective.

+8  A: 

It doesn't really matter. I've never run into a system where there is a real difference between choice 1 and choice 2.

Jeff Atwood had a great article a while back on this topic. Basically people debate and argue the most furiously those topics which they cannot be proven wrong on. Or from a different angle, those topics which can only be won through endurance filibuster last-man-standing arguments.

Pick one and tell them to focus on issues that actually impact your code.

EDIT: If you want to have fun, have them specify at length why their method is superior for recursive table references.

Russell Steen
+1, for common sense... There are more important things to argue about.. So, do it my way (choice 2)
Charles Bretana
And, for self-referencing DRI, when there are more than one FK that self-references the same PK, you HAVE to violate both "standards", since the two FK columns can't be named the same... e.g., EmployeeTable with EmployeeId PK, SupervisorId FK, MentorId Fk, PartnerId FK, etc. etc...
Charles Bretana
+1  A: 

The convention we use where I work is pretty close to A, with the exception that we name tables in the plural form (ie, "employees") and use underscores between the table and column name. The benefit of it is that to refer to a column, it's either "employees _ id" or "employees.id", depending on how you want to access it. If you need to specify what table the column is coming from, "employees.employees _ id" is definitely redundant.

Jarett
I've not decided if I like pluralised table names. By using the singular the queries seem to read better ("employee.name" as opposed to "employees.name"). Even in joins it seems to read better as you are joining single records to another table. But pluralised table names seems more accurate when thinking about the table, rather than the query. I'll be sticking with singular as that is what we use, but i think it's also the right way to go (though again, many disagree)
Dems
Yeah. It's more of a personal preference and/or whatever you're used to seeing, I guess.
Jarett
+1  A: 

I use convention #2. I'm working with a legacy data model now where I don't know what stands for in a given table. Where's the harm in being verbose?

OMG Ponies
+1  A: 

Neither convention works in all cases, so why have one at all? Use Common sense...

e.g., for self-referencing table, when there are more than one FK column that self-references the same table's PK, you HAVE to violate both "standards", since the two FK columns can't be named the same... e.g., EmployeeTable with EmployeeId PK, SupervisorId FK, MentorId Fk, PartnerId FK, ...

Charles Bretana
+1 for actual technical objective answer
DVK
A good, applicable answer, but the arguments to Dems' answer miss the point.
JYelton
+7  A: 

I think it depends on your how you application is put together. If you use ORM or design your tables to represent objects then option 1 may be for you.

I like to code the database as its own layer. I control everything and the app just calls stored procedures. It is nice to have result sets with complete column names, especially when there are many tables joined and many columns returned. With this stype of application, I like option 2. I really like to see column names match on joins. I've worked on old systems where they didn't match and it was a nightmare,

KM
+1 for having to figure out joins with non matching column names
Raj More
on "old systems" the handicap of 8 character long names that hurts a lot more than this. I'm willing to go out on a limb and speculate that having the PK named ID was not the primary cause of the nightmare in the old systems you were dealing with. Also "it sucked in old systems" is used waaaaay too often in software development, especially databases. I routinely see people justifying any given practice A, based on the way it worked in their experience on a DB system released 10+ years ago.
Russell Steen
today's _state of the art_ applications will be _old crap_ in a few years. you might even rewrite the interface, or use the data in another platform, but your data (including your column names) will need to stand the test of time.
KM
So people 20 years ago should have somehow used column names that made sense today, even though they only had 8 characters?Data storage formats have changed drastically over the past 20 years, and will change again in the next 20. There's no way to demonstrate that your preference will stand the test of time better than the other method listed. "column names" may themselves be "old crap" by the time people are having this discussion in 20 years, as our ability to store and manipulate data improves. Tables are a human construct that imperfectly represent data relations...
Russell Steen
"ID" - wow that communicates a lot! I now see the light, I'll do it your way @Russell Steen
KM
Thank you for the well reasoned intellectual response.
Russell Steen
@Russell Steen, in your answer you say "Basically people debate and argue the most furiously those topics which they cannot be proven wrong on". You are one of those people! let it go. In my answer I try to explain why both might be used, follow your own advice and "focus on issues that actually impact your code"
KM
+9  A: 

If the two columns have the same name in both tables (convention #2), you can use the USING syntax in SQL to save some typing and some boilerplate noise:

SELECT name, address, amount
  FROM employees JOIN payroll USING (employee_id)

Another argument in favor of convention #2 is that it's the way the relational model was designed.

The significance of each column is partially conveyed by labeling it with the name of the corresponding domain.

Steven Huwig
Interesting, I did not know about the using column.
Jeremy
SQL syntax and semantics actually give a pretty good clue as to how it should be used. e.g. USING syntax means columns with the same domain should have the same name, NULL = NULL -> NULL means NULL is "unknown" rather than "not applicable", and ON UPDATE CASCADE means that keys need only be unique, not immutable.
Steven Huwig
I think this answer convinced me to use convention #2 in future projects.
JYelton
Glad to help! :)
Steven Huwig
+2  A: 

I agree that there is little to choose between them. To me a much more significant thing about either standard is the "standard" part.

If people start 'doing their own thing' they should be strung up by their nethers. IMHO :)

Dems
+1 for recognizing that consistency is more important than being "right" (in this case)
Russell Steen
-1 for attempting to apply a "foolish consistency". Old chinese proverb says "A foolish consistency is a hobgoblin for simple minds."
Charles Bretana
@charles: in a world where different people maintain each others code, often when the writer has left and the documentation is obsolete or non-existant, this is not a foolish consistency. I am so glad I don't work with You...
Dems
@Dems, no offense intended, but this is foolish, for two reasons. 1) There are common, clearly understood scenarios where ANY standard would HAVE to be violated. (see my answer for examples and 2) because on this issue, at least, a standard would add very little value, except to make folks who like standards feel more comfortable...
Charles Bretana
@charles: You seem to be suggesting "if it ain't perfect, don't try it". Unfortunately, common sense to one person is not always common sense to another. In order to engender familiarity and consistency, standards -are- of benefit. I have recently worked on a piece of code where multiple people have worked on it, each with their own naming conventions for tables and fields. It's a mess, and a nightmare to maintain. Standards are a common "common sense" rather than everyone using their own personal "common sense".
Dems
@Dems, The pendelum doesn't have to swing so far to one side that it crashes into and knocks itself over. I suggest no such thing. A "Standard" that must be broken in common scenarios is no standard at all, and only adds confusion and extra work. Not having a standard does not imply "a nightmare to maintain". Do you have a standard for what devs wears to the office? for left or right handed mice? Or do you require everyone use a trackball? Not everything needs to be standardized. My suggesting this issue does not need a standard does not mean I advocate no standards at all...
Charles Bretana
+1 for Dems for making sense, and -1 for Charles Bretana for making mountains of molehills.
JYelton
@JYelton, Applying standards when there is no reason to do so other than making youself feel more comfortable or more organized is what is "making a mountain out of a molehill"... advocating that unnecessary standards be left alone is indeed leaving the molehill untouched and undisturbed. It never fails to amaze me how arguments can be presented that are exactly opposite the substance of what they advocate.
Charles Bretana
+1  A: 

How about naming the foreign key

role_id

where role is the role the referenced entity has relativ to the table at hand. This solves the issue of recursive reference and multiple fks to the same table.

In many cases will be identical to the referenced table name. In this cases it becomes identically to one of your proposals.

In any case havin long arguments is a bad idea

Jens Schauder
A: 

"Where in "employee INNER JOIN order ON order.employee_id = employee.id" is there a need for additional qualification?".

There is no need for additional qualification because the qualification I talked of is already there.

"the reason that a business user refers to Order ID or Employee ID is to provide context, but at a dabase level you already have context because you are refereing to the table".

Pray, tell me, if the column is named 'ID', then how is that "refereing [sic] to the table" done exactly, unless by qualifying this reference to the ID column exactly in the way I talked of ?

A: 

If you are looking at application code, not just database queries, some things seem clear to me:

  1. Table definitions usually directly map to a class that describes one object, so they should be singular. To describe a collection of an object, I usually append "Array" or "List" or "Collection" to the singular name, as it more clearly than use of plurals indicates not only that it is a collection, but what kind of a collection it is. In that view, I see a table name as not the name of the collection, but the name of the type of object of which it is a collection. A DBA who doesn't write application code might miss this point.

  2. The data I deal with often uses "ID" for non-key identification purposes. To eliminate confusion between key "ID"s and non-key "ID"s, for the primary key name, we use "Key" (that's what it is, isn't it?) prefixed with the table name or an abbreviation of the table name. This prefixing (and I reserve this only for the primary key) makes the key name unique, which is especially important because we use variable names that are the same as the database column names, and most classes have a parent, identified by the name of the parent key. This also is needed to make sure that it is not a reserved keyword, which "Key" alone is. To facilitate keeping key variable names consistent, and to provide for programs that do natural joins, foreign keys have the same name as is used in the table in which they are the primary key. I have more than once encountered programs which work much better this way using natural joins. On this last point, I admit a problem with self-referencing tables, which I have used. In this case, I would make an exception to the foreign key naming rule. For example, I would use ManagerKey as a foreign key in the Employee table to point to another record in that table.

Bruce Patin
A: 

I like convention #2 - in researching this topic, and finding this question before posting my own, I ran into the issue where:

I am selecting * from a table with a large number of columns and joining it to a second table that similarly has a large number of columns. Both tables have an "id" column as the primary key, and that means I have to specifically pick out every column (as far as I know) in order to make those two values unique in the result, i.e.:

SELECT table1.id AS parent_id, table2.id AS child_id

Though using convention #2 means I will still have some columns in the result with the same name, I can now specify which id I need (parent or child) and, as Steven Huwig suggested, the USING statement simplifies things further.

JYelton
`SELECT *` is a no-no for (most) production queries, anyway, so that's not much of a reason to pick a naming standard.
P Daddy
Not disagreeing: could you provide a link to a reason why this is so? I don't like the idea of having to maintain the names of 80 columns in my query.
JYelton
Can't find a link at the moment (hard to google for "*"), but I'll outline the basic points: (1) changes to the table(s) can negatively impact your application, (2) it can be bad for performance, and (3) specifying explicitly what data you actually need can make your code easier to understand. These points could expansion, and there are exceptions (as I alluded to) but that's not appropriate here. If you post this as a new question, I (and others) would be glad to elaborate further.
P Daddy
I may do that. I realize the performance benefit, but have to consider the time investment when editing code. I am always looking for ways to improve the interaction between the app and the database. Thanks.
JYelton