views:

82

answers:

5

How should columns in a database table that have a PK/FK relationship be named? Should their names contain a reference to the table they are in? Consider the examples below. Each employee has an ID which is a FK in the salaries table.

/* Employees Option 1*/
CREATE TABLE dbo.Employees 
( 
    EmployeeID INT PRIMARY KEY, 
    EmployeeFirstName VARCHAR(32), 
    EmployeeLastName VARCHAR(32), 
    EmployeeEmail VARCHAR(255) -- , ... 
)


/* Employees Option 2*/
CREATE TABLE dbo.Employees 
( 
    EmployeeID INT PRIMARY KEY, 
    FirstName VARCHAR(32), 
    LastName VARCHAR(32), 
    Email VARCHAR(255) -- , ... 
)

/* Employees Option 3*/
CREATE TABLE dbo.Employees 
( 
    ID INT PRIMARY KEY, 
    FirstName VARCHAR(32), 
    LastName VARCHAR(32), 
    Email VARCHAR(255) -- , ... 
)

/* Salaries Option 1*/
CREATE TABLE dbo.Salaries 
( 
    EmployeeID INT, 
    Salary INT
)

/* Salaries Option 2*/
CREATE TABLE dbo.Salaries 
( 
    Employee INT, 
    Salary INT
)

I have more experience with object oriented programing then database design. With OOP when naming properties of a class you would not want to repeat the name of the class as it would be redundant (Employee.ID not Employee.EmployeeID). Thus off hand I think Employee Option 3 and Salaries Option 1 above would be best, as this how I would name properties of classes in OOP

Am I right? Is there something else I should be considering that applies to database design that does not apply to OOP?

+1  A: 

I actually like Option2 the best, since if I have several tables with the column ID (which I often do) I can write it as E.EmployeeID, P.ProjectID, T.TaskID instead of E.ID, P.ID, T.ID, which is more readable IMO. Usually, the columns of the seperate tables are different enough that I only do this to the ID column.

Matthew Jones
See I was thinking I'd like the primary key column of a table to always be ID, and then any column in another table that is a foriegn key to that table to be named EmployeeID. That way if I see a column named ID I know it is the ID of that table, and if I see a column named EmployeeID then I know it is referring to the ID column of the Employee table. Otherwise you may end up with a naming such as Employee.EmployeeID and Salaries.EmployeeID where the direction of the relationship is not immediately obvious.
Eric Anastas
@Eric, to me `Employees.EmployeeID` is immediately obvious as the primary key. You just have to be careful with your table naming.
Matthew Jones
True it just seem redundant to me. It also seems like most examples I've been looking at follow this kind of naming, and I'd prefer to conform to what most more experience people do. Although I think there are two options which could work well. Employee.EmployeeID and Salaries.Employee or Empoyee.ID and Salaries.EmployeeID. The first is more common yes?
Eric Anastas
@Eric If you have many columns named "ID" that aren't actually all FKs pointing to the same PK, you have ambiguous column names. You should not need to have a table prefix or alias to differentiate the column names because errant JOINs are harder to detect and relationships are non-obvious. Further, if column names are unique in the schema, you can omit the table prefix entirely, which increases readability between different queries (where the alias may have changed otherwise).
banzaimonkey
+3  A: 

The ISO 11179 has some good things to say about naming. I recommend it.

Data elements should always be named for what they are, not by their place in a structure. Also they should be unique within the namespace, schema or other context in which they appear. The names should contain only commonly understood abbreviations.

On that basis EmployeeID is a reasonable name for an employee identifier. ID is a bad name because it tells you nothing useful.

Also, it is a very widely observed convention that foreign key attributes should be named the same as the key attributes they reference (because usually they are implicitly the same data element - just in different tables). The only time I would usually break that rule is if a single table contains two foreign keys referencing the same column in another table. In that case the names obviously need to be different to avoid a naming conflict.

dportas
A: 

I used to use Id universally (option 3), though I think I have come around to using full name (EmployeeId), for the express purpose of having FKs be the same names as PKs.

Salaries should definitely be Option 2 (EmployeeId). Salaries.Employee sucks, especially if you're using some sort of ORM that wants Employee to be the referencing entity.

Shlomo
+1  A: 

@dportas covered the major points very well.

Here's an example of how these tables may be defined.

create table dbo.Employee
(
    EmployeeId int not null primary key,
    FirstName nvarchar(100) not null,
    LastName nvarchar(100) not null,
    Email nvarchar(255) not null,
)

create table dbo.Salary
(
    EmployeeId int not null 
        foreign key references dbo.Employee (EmployeeId),
    BeginDate datetime not null,
        primary key (EmployeeId, BeginDate),
    EmploymentClassificationId int not null 
        foreign key references dbo.EmploymentClassification (EmploymentClassificationId),
    PerWeekAmount money not null,
)

Is there something else I should be considering that applies to database design that does not apply to OOP?

Is this an invitation to rant about database design? Well...OK.

Database design is very different from OOP. A DB is all about accurately representing the state of the world, while the goal of OOP is getting stuff done. An object represents an entity, while a row in a table states a (hopefully true) proposition.

My advise for DB design is to literally construct a proposition for each table that exactly states what a row in that table represents. Then design the system so that those propositions will always be true (and any missing rows from a table represent false statements). And since tables are fundamentally different from objects, don't be afraid to represent a single entity across several tables if it helps to accurately represent the facts being modeled.

It is often best to focus on what kind of reports should be generated from the data rather than how an application is going to create or use the data.

And in a base relation NULL means "unknown". It doesn't mean "not applicable". If there's a space for non-applicable data, then something is wrong with the design.

Jeffrey L Whitledge
Absolutely , designing a relational database using OOP principles is counterproductive and in general a bad idea. Databases are a differnt animal, learn what works better for them. Learn set theory when designing datbases and normalization.
HLGEM
+1  A: 

ID is a terrible name especially when you are doing reporting and joins to multiple tables, you will have to alias all those id fields anyway since you can't have multiple fields in a report with the same name. ID is much more confusing when you start doing complex queries.

HLGEM