Here's a fictional scenario with some populated data. For tax purposes, my fictional company must retain records of historical data. For this reason, I've included a version column to the table.
TABLE EMPLOYEE: (with personal commentary)
|ID | VERSION | NAME | Position | PAY |
+---+---------+------------+----------+-----+
| 1 | 1 | John Doe | Owner | 100 | Started company
| 1 | 2 | John Doe | Owner | 80 | Pay cut to hire a coder
| 2 | 1 | Mark May | Coder | 20 | Hire said coder
| 2 | 2 | Mark May | Coder | 30 | Productive coder gets raise
| 3 | 1 | Jane Field | Admn Asst| 15 | Need office staff
| 2 | 3 | Mark May | Coder | 35 | Productive coder gets raise
| 1 | 3 | John Doe | Owner | 120 | Sales = profit for owner!
| 3 | 2 | Jane Field | Admn Asst| 20 | Raise for office staff
| 4 | 1 | Cody Munn | Coder | 20 | Hire another coder
| 4 | 2 | Cody Munn | Coder | 25 | Give that coder raise
| 3 | 3 | Jane Munn | Admn Asst| 20 | Jane marries Cody <3
| 2 | 4 | Mark May | Dev Lead | 40 | Promote mark to Dev Lead
| 4 | 3 | Cody Munn | Coder | 30 | Give Cody a raise
| 2 | 5 | Mark May | Retired | 0 | Mark retires
| 5 | 1 | Joey Trib | Dev Lead | 40 | Bring outside help for Dev Lead
| 6 | 1 | Hire Meplz | Coder | 10 | Hire a cheap coder
| 3 | 4 | Jane Munn | Retired | 0 | Jane quits
| 7 | 1 | Work Fofre | Admn Asst| 10 | Hire Janes replacement
| 8 | 1 | Fran Hesky | Coder | 10 | Hire another coder
| 9 | 1 | Deby Olav | Coder | 25 | Hire another coder
| 4 | 4 | Cody Munn | VP Ops | 80 | Promote Cody
| 9 | 2 | Deby Olav | VP Ops | 80 | Cody fails at VP Ops, promote Deby
| 4 | 5 | Cody Munn | Retired | 0 | Cody retires in shame
| 5 | 2 | Joey Trib | Dev Lead | 50 | Give Joey a raise
+---+---------+------------+----------+-----+
Now, if I wanted to do something like "Get a list of the current coders" I couldn't just do SELECT * FROM EMPLOYEE WHERE Position = 'Coder'
because that would return lots of historical data... which is bad.
I'm looking for good ideas to handle this scenario. I see a few options that jump out at me, but I'm sure someone's going to say "Wow, that's a rookie mistake, glow... try this on for size:" which is what this place is all about, right? :-)
Idea number 1: Keep a version table with the current version like this
TABLE EMPLOYEE_VERSION:
|ID |VERSION|
+---+-------+
| 1 | 3 |
| 2 | 5 |
| 3 | 4 |
| 4 | 6 |
| 5 | 2 |
| 6 | 1 |
| 7 | 1 |
| 8 | 1 |
| 9 | 2 |
+---+-------+
Although I'm not sure how I'd do that with a single query, I'm sure it could be done, and I bet I could figure it out with a rather small amount of effort.
Of course, I would have to update this table every time I insert into the EMPLOYEE table to increment the version for the given ID (or insert into the version table when a new id is made).
The overhead of that seems undesireable.
Idea number 2: Keep an archive table and a main table. Before updating the main table, insert the row I'm about to overwrite into archive table, and use the main table as I normally would as if I wasn't concerned about versioning.
Idea number 3: Find a query that adds something along the lines of SELECT * FROM EMPLOYEE WHERE Position = 'Coder' and version=MaxVersionForId(EMPLOYEE.ID)
... Not entirely sure how I'd do this. This seems the best idea to me, but I'm really not sure at this point.
Idea number 4: Make a column for "current" and add "WHERE current = true AND ..."
It occurs to me that surely people have done this before, run into these same problems, and have insight on it to share, and so I come to collect that! :) I've tried to find examples of the problem on here already, but they seems specialized to a particular scenario.
Thanks!
EDIT 1:
Firstly, I appreciate all answers, and you've all said the same thing - DATE
is better than VERSION NUMBER
. One reason I was going with VERSION NUMBER
was to simplify the process of updating in the server to prevent the following scenario
Person A loads employee record 3 in his session, and it has version 4. Person B loads employee record 3 in his session, and it has version 4. Person A makes changes and commits. This works because the most recent version in the database is 4. It is now 5. Person B makes changes and commits. This fails because the most recent version is 5, while his is 4.
How would the EFFECTIVE DATE
pattern address this issue?
EDIT 2:
I think I could do it by doing something like this: Person A loads employee record 3 in his session, and it's effective date is 1-1-2010, 1:00 pm, with no experation. Person B loads employee record 3 in his session, and it's effective date is 1-1-2010, 1:00 pm, with no experation. Person A makes changes and commits. The old copy goes to the archive table (basically idea 2) with an experation date of 9/22/2010 1:00 pm. The updated version of the main table has an effective date of 9/22/2010 1:00 pm. Person B makes changes and commits. The commit fails because the effective dates (in the database and session) don't match.