tags:

views:

1423

answers:

8

I have a simple table comments (id int, revision int, comment varchar(140)) with some content like this:

1|1|hallo1|
1|2|hallo2|
1|3|hallo3|
2|1|hallo1|
2|2|hallo2|

I'm searching for a sql statement which will return each comment with the highest revision:

1|3|hallo3|
2|2|hallo2|

I've got a solution

select id, revision, comment 
  from comments 
  where revision = (
      select max(revision) 
        from comments as f 
        where f.id = comments.id
  );

but it is very slow on large data sets. Any better solution?

+4  A: 
  1. Make sure you have your indexes set up appropriately. Indexing on id, revision would be good.

  2. Here's a different take on your query. Haven't checked its execution plan, but if you set up the index well it should help:

    SELECT c.* 
      FROM comments c
      INNER JOIN (
            SELECT id,max(revision) AS maxrev 
              FROM comments 
              GROUP BY id
      ) b
        ON c.id=b.id AND c.revision=b.maxrev
    

Editted to add:

  1. If you're on SQL Server, you might want to check out Indexed Views as well:
    http://www.microsoft.com/technet/prodtechnol/sql/2005/impprfiv.mspx

Editted again to add info:

Subquery:
25157 records
2 seconds
Execution plan includes an Index Seek (82%) base and a Segment (17%)

Left Outer Join:
25160 records
3 seconds
Execution plan includes two Index Scans @ 22% each with a Right Outer Merge at 45% and a Filter at 11%

I'd still go with the sub query.

nathaniel
+4  A: 

Here's one way that with appropriate indexing will not be heinously slow and it doesn't use a subselect:

SELECT comments.ID, comments.revision, comments.comment FROM comments 
LEFT OUTER JOIN comments AS maxcomments 
ON maxcomments.ID= comments.ID
AND maxcomments.revision > comments.revision
WHERE maxcomments.revision IS NULL

Adapted from queries here: http://www.xaprb.com/blog/2007/03/14/how-to-find-the-max-row-per-group-in-sql-without-subqueries/

(From google search: max group by sql)

devinmoore
+3  A: 

Tested with one of our tables that has nearly 1 million rows total. Indexes exist on both fields FIELD2 AND FIELD3. Query returned 83953 rows in under 3 seconds on our dev box.

select FIELD1, FIELD2, FIELD3 from OURTABLE (nolock) T1 WHERE FIELD3 = ( SELECT MAX(FIELD3) FROM OURTABLE T2 (nolock) WHERE T1.FIELD2=T2.FIELD2 ) ORDER BY FIELD2 DESC

Nikki9696
A: 

Idea from left field, but what about adding an extra field to the table:

CurrentRevision bit not null

Then when you make a change, set the flag on the new revision and remove it on all previous ones.

Your query would then simply become:

select  Id,
        Comment
from    Comments
where   CurrentRevision = 1

This would be much easier on the database and therefore much faster.

Garry Shutler
A: 

One quite clean way to do "latest x by id" type queries is this. It should also be quite easy to index properly.

SELECT id, revision, comment 
FROM comments
WHERE (id, revision) IN (
  SELECT id, MAX(revision)
  FROM comments
  -- WHERE clause comes here if needed
  GROUP BY id
)
Rowan
A: 

For big tables I find that this solution can has a better performance:

    SELECT c1.id, 
           c1.revision, 
           c1.comment 
      FROM comments c1 
INNER JOIN ( SELECT id, 
                max(revision) AS max_revision
               FROM comments 
           GROUP BY id ) c2
        ON c1.id = c2.id
       AND c1.revision = c2.max_revision
borjab
A: 

Analytics would be my recommendation.

select id, max_revision, comment
from (select c.id, c.comment, c.revision, max(c.revision)over(partition by c.id) as max_revision
      from comments c)
where revision = max_revision;
A: 

GOOD SOLUTION , THANKS ROWAN