views:

602

answers:

7

In my present Rails application, I am resolving scheduling conflicts by sorting the models by the "created_at" field. However, I realized that when inserting multiple models from a form that allows this, all of the created_at times are exactly the same!

This is more a question of best programming practices: Can your application rely on your ID column in your database to increment greater and greater with each INSERT to get their order of creation? To put it another way, can I sort a group of rows I pull out of my database by their ID column and be assured this is an accurate sort based on creation order? And is this a good practice in my application?

+1  A: 

That depends on your database vendor.

MySQL I believe absolutely orders auto increment keys. SQL Server I don't know for sure that it does or not but I believe that it does.

Where you'll run into problems is with databases that don't support this functionality, most notably Oracle that uses sequences, which are roughly but not absolutely ordered.

An alternative might be to go for created time and then ID.

cletus
I like the idea of redundancy, sorting by one and then using ID for a secondary sort. . . thanks!
BushyMark
Oracle sequences are ordered for non-RAC installations, which are the great majority of them.
David Aldridge
+1  A: 

The generated identification numbers will be unique. Regardless of whether you use Sequences, like in PostgreSQL and Oracle or if you use another mechanism like auto-increment of MySQL.

However, Sequences are most often acquired in bulks of, for example 20 numbers. So with PostgreSQL you can not determine which field was inserted first. There might even be gaps in the id's of inserted records.

Therefore you shouldn't use a generated id field for a task like that in order to not rely on database implementation details.

Generating a created or updated field during command execution is much better for sorting by creation-, or update-time later on. For example:

INSERT INTO A (data, created) VALUES (smething, DATE())
UPDATE A SET data=something, updated=DATE()
Philipp
A: 

I believe the answer to your question is yes...if I read between the lines, I think you are concerned that the system may re-use ID's numbers that are 'missing' in the sequence, and therefore if you had used 1,2,3,5,6,7 as ID numbers, in all the implementations I know of, the next ID number will always be 8 (or possibly higher), but I don't know of any DB that would try and figure out that record Id #4 is missing, so attempt to re-use that ID number.

Though I am most familiar with SQL Server, I don't know why any vendor who try and fill the gaps in a sequence - think of the overhead of keeping that list of unused ID's, as opposed to just always keeping track of the last I number used, and adding 1.

I'd say you could safely rely on the next ID assigned number always being higher than the last - not just unique.

EJB
A: 

Yes the id will be unique and no, you can not and should not rely on it for sorting - it is there to guarantee row uniqueness only. The best approach is, as emktas indicated, to use a separate "updated" or "created" field for just this information.

For setting the creation time, you can just use a default value like this

CREATE TABLE foo (
  id INTEGER UNSIGNED AUTO_INCREMENT NOT NULL;
  created TIMESTAMP NOT NULL DEFAULT NOW();
  updated TIMESTAMP;
  PRIMARY KEY(id);
) engine=InnoDB; ## whatever :P

Now, that takes care of creation time. with update time I would suggest an AFTER UPDATE trigger like this one (of course you can do it in a separate query, but the trigger, in my opinion, is a better solution - more transparent):

DELIMITER $$
CREATE TRIGGER foo_a_upd AFTER UPDATE ON foo
FOR EACH ROW BEGIN
  SET NEW.updated = NOW();
END;
$$
DELIMITER ;

And that should do it.

EDIT: Woe is me. Foolishly I've not specified, that this is for mysql, there might be some differences in the function names (namely, 'NOW') and other subtle itty-bitty.

shylent
You're ignoring what the OP said: he already has a created_at field and that it is the same when several items are inserted at the same time, hence his question.
cletus
It would, indeed, seem, that my train of thought, wildly swerving as it is, has wandered off to a side track. Sorry.You are, of course, right. Sorting first by created_at and then by id is, probably the easiest solution (even though it, probably, doesn't strictly guarantee the correct order, it only guarantees, that the rows with the same created_at values will be returned in the same order each time).
shylent
A: 

One caveat to EJB's answer:

SQL does not give any guarantee of ordering if you don't specify an order by column. E.g. if you delete some early rows, then insert 'em, the new ones may end up living in the same place in the db the old ones did (albeit with new IDs), and that's what it may use as its default sort.

FWIW, I typically use order by ID as an effective version of order by created_at. It's cheaper in that it doesn't require adding an index to a datetime field (which is bigger and therefore slower than a simple integer primary key index), guaranteed to be different, and I don't really care if a few rows that were added at about the same time sort in some slightly different order.

Sai Emrys
A: 

This is probably DB engine depended. I would check how your DB implements sequences and if there are no documented problems then I would decide to rely on ID.

E.g. Postgresql sequence is OK unless you play with the sequence cache parameters.

There is a possibility that other programmer will manually create or copy records from different DB with wrong ID column. However I would simplify the problem. Do not bother with low probability cases where someone will manually destroy data integrity. You cannot protect against everything.

My advice is to rely on sequence generated IDs and move your project forward.

Greg Dan
A: 

In theory yes the highest id number is the last created. Remember though that databases do have the ability to temporaily turn off the insert of the autogenerated value , insert some records manaully and then turn it back on. These inserts are no typically used on a production system but can happen occasionally when moving a large chunk of data from another system.

HLGEM