views:

168

answers:

2

I am currently working on a simple revision system that enables me to store multiple versions of a single file, which works fine so far.

Table structure is as follows (obsolete columns removed for the sake of brevity):

file_id     file_revision     file_parent      file_name
--------------------------------------------------------
1           1                 0                foo.jpg
2           2                 1                foorevised.jpg                 
3           3                 1                anotherrevision.jpg

Where:

  • file_id is the primary key, which auto increments
  • file_revision stores the revision number, defaulting to 1 when it's the first
  • file_parent is the top level parent of revision, defaulting to 0 when first.
  • file_name being the file name.

The problem:

  • Preferably using a single query I want to retrieve all files...
  • But only the latest revision of each file...
  • ... when only one revision is stored (original), this one should be retrieved.

Any pointers are greatly appreciated. Thanks in advance.

+2  A: 

The most efficient way for the sake of retrieval is to add a column like is_latest which you need to populate in advance, then select * from table where file_id=1 and is_latest=true when you want to grab the latest version of file 1. Obviously this will make updating this table more complicated, however.

Another way to do it would be to store the latest versions of the files in one table, and historical versions in another table. If you predominantly want to select all files that are the latest version, select * from table where is_latest=true could likely amount to a full table scan even if if is_latest is indexed. If the latest rows were all in one table the database can read them all out in sequential IO and not have to either 1) do a lot of seeks through the table to find just the records it needs or 2) scan the whole table discarding large amounts of data along the way for the old records.

Assuming you don't want to change the existing table design, what you want to do is called selecting the groupwise maximum, see this article for several different ways to do it in mysql.

ʞɔıu
A: 

I would have to agree with nick in adding another column to the table. We had a similar problem to this on a project in the past and I was really glad that we decided to just add the extra row.

It makes the programming easier, and it makes the query faster by removing the subqueries.

corymathews