views:

133

answers:

6

I'm trying to figure out what to call the column in my database table that holds an INT to specific "record version". I'm currently using "RecordOrder", but I don't like that, because people think higher=newer, but the way I'm using it, lower=newer (with "1" being the current record, "2" being the second most current, "3" older still, and so on). I've considered "RecordVersion", but I'm afraid that would have the same problem. Any other suggestions? "RecordAge"?

I'm doing this because when I insert into the table, instead of having to find out what version is next, then run the risk of having that number stolen from me before I write, I just insert insert with a "RecordOrder" of 0. There's a trigger on the table AFTER INSERT that increments all the "RecordOrder" numbers for that key by 1, so the record I just inserted becomes "1", and all others are increased by 1. That way, you can get a person's current record by selection RecordOrder=1, instead of getting the MAX(RecordOrder) and then selecting that.

PS - I'm also open to criticism about why this is a terrible idea and I should be incrementing this index instead. This just seemed to make lookups much easier, but if it's a bad idea, please enlighten me!

Some details about the data, as an example:

I have the following database table:

CREATE TABLE AmountDue (
    CustomerNumber INT,
    AmountDue      DECIMAL(14,2),
    RecordOrder    SMALLINT,
    RecordCreated  DATETIME
)

A subset of my data looks like this:

CustomerNumber    Amountdue      RecordOrder                 RecordCreated
           100            0                1       2009-12-19 05:10:10.123
           100        10.05                2       2009-12-15 06:12:10.123
           100       100.00                3       2009-12-14 14:19:10.123
           101         5.00                1       2009-11-14 05:16:10.123

In this example, there are three rows for customer 100 - they owed $100, then $10.05, and now they owe nothing. Let me know if I need to clarify it some more.

UPDATE:

The "RecordOrder" and "RecordCreated" columns are not available to the user - they're only there for internal use, and to help figure out which is the current customer record. Also, I could use it to return an appropriately-ordered customer history, though I could just as easily do that with the date. I can accomplish the same thing as an incrementing "Record Version" with just the RecordCreated date, I suppose, but that removes the convenience of knowing that RecordOrder=1 is the current record, and I'm back to doing a sub-query with MAX or MIN on the DateTime to determine the most recent record.

+3  A: 

I think that "current version = 1" is a bad idea because when you add a new current record, you'll have to update ALL the previous versions. And any other tables or applications that refer to the old version numbers will now be wrong. I got to write a service that interfaces with a mainframe program that works like that, and it was a huge headache that wasted dozens of developer hours.

I usually do versioning with a version_id field, that just gets incremented every time. Then when I want to find the newest record, I order by version_id desc in the query and select the first row only.

Edit: I didn't see the datawarehousing tag that iandisme just pointed out. If selecting all the versions wouldn't work, I've seen some systems that keep a separate table that just stores the most up-to-date version for each record in another table. So when a new version is added to the Record table, the appropriate RecordVersion record gets updated to store that new version. This has always been overkill for the stuff I work on, but I don't work on whole warehouses of data, so I don't know if that would be any better or worse.

Kaleb Brasee
+1 cause that's the way I usually do it. Though, if searches on this table are *much* more common than inserts, rwmnau's method might be a little more efficient than running the MAX aggregation on every search. Since he tagged datawarehousing, that's probably true.
iandisme
I can't do an "order by version_id DESC" because that only works when I'm looking up the current record for a single customer. I often pull large groups of customers from this table, so a nested query with MAX(version) is my only choice if I increment the number - that's why I chose to do it "backwards" so to say. I've had a feeling this is unfriendly though, hence the question :)
rwmnau
A: 

What you're talking about is an age rather than a version so you could call it RecordAge. But I would get rid of it altogether since all you seem to want to do is get the most recent order for a specific customer.

That can be achieved by using the customer number and the date/time field. If you make the combination of those two a unique constraint, and put retry logic in your client code on the rare chance you get a race condition, you should have no problems.

The idea of firing a trigger to modify lots of records when inserting one is a bad idea since it becomes more and more expensive as records get added.

Look very critically at any design that introduces an arbitrary column like your version number. It's actually a derived attribute (depends on other attributes, in this case, customer number and date) and, while it's okay to do this for performance reasons, it should only be done to alleviate a specific problem and only if you understand the consequences.

I can't see the performance improvement in using a small integer over the date as offsetting the cost of the trigger myself but, as with all database decisions, measure, don't guess.

paxdiablo
+1  A: 

Any reason why you can't use the RecordCreated date to accomplish essentially the same thing?

Doing something like:

select top 1 columna, columnb, etc. from table order by RecordCreated desc

would give you the newest record and you wouldn't have to worry about record modifications.

You could potentially cause all sorts of issues with the record renumbering approach (contention on indexes, lock escalation, etc.).

Lock escalation: http://msdn.microsoft.com/en-us/library/aa213033(SQL.80).aspx

Index contention: http://blogs.digineer.com/blogs/jasons/archive/2009/02/25/monitoring-index-contention-with-dmfs.aspx

As others have mentioned in regards to data warehousing, you could always put a view or snapshot or something similar on top of the dataset to give you the data in the format you desire (newest record only, etc.). Obviously I'm not familiar which your constraints or requirements enough to know whether view/snapshot/etc. makes sense in your situation.

Brian Hasden
A: 

Ok the problem I see with this is that you are going to waste an awful lot of database processing time renumbering. Also, I don't know if you are exposing the recordOrder to the user, but if you are, they will want to be able to ask questions about it using that number and having it continually changing will be both confusing and annoying. And of course the problem you already pointed out where the order is not the one a future developer might expect.

Why not just use an identity field and then the numbering will be automatic (then ther wil lnever be an issue with stealing a number before you insert). Does it really matter if the numbering is repeated for each customer as long as you can order by the id desc and the customer number to see all the records for one person? Or you could use the record date field possibly to order the records.

HLGEM
+2  A: 

Instead of using Integers to state which version of a row is the latest, I think you should use a TimeStamp field with a default value of the current time.

This way, no matter who adds a row at what time, there will be no ambiguity as to the latest version of the row.

I think renumbering all associated rows to id+1 is not a good idea for multiple reasons:

  1. You are changing something that there is no need to change
    • Locking and Blocking will increase
    • Using more processing power and memory than required
Raj More
great idea ! +1You double the signification, showing which record is the last one and when the edit happened.
iDevlop
A: 

I like Raj More's idea of using the time stamp. But I realise that any query searching for the latest records will be difficult and will induce a heavy processing.
So I would suggest this idea: use the Time Stamp (it is there anyway) for the Record order, but keep one byte field to easily identify the Latest Record.
In that case you would have the current record having its LatestRecord value = 1, while every other = NUll. This way you might be able to create an index ignoring the nulls ?
That would greatly simplify all your queries.

iDevlop