views:

572

answers:

4

I have a query that is currently using a correlated subquery to return the results, but I am thinking the problem could be solved more eloquently perhaps using ROW_NUMBER().

The problem is around the profile of a value v, through a number of years for an Item. Each item has a number of versions, each with its own profile whick starts when the version is introduced and the data currently looks like this:

    
ItemId    ItemVersionId    Year    Value
===========================================
1         1                01      0.1
1         1                02      0.1
1         1                03      0.2
1         1                04      0.2
1         1                05      0.2
1         1                06      0.3
1         1                07      0.3
1         1                08      0.4
1         2                04      0.3
1         2                05      0.3
1         2                06      0.3
1         2                07      0.4
1         2                08      0.5
1         3                07      0.6
1         3                08      0.7
2         1                01      0.1
2         1                01      0.1
2         1                01      0.2
etc

I want to return the full profile for an Item using the most recent version where applicable. For the above example for item 1:

ItemId    ItemVersionId    Year    Value
===========================================
1         1                01      0.1
1         1                02      0.1
1         1                03      0.2
1         2                04      0.3
1         2                05      0.3
1         2                06      0.3
1         3                07      0.6
1         3                08      0.7

I am currently using

SELECT ItemId, ItemVersionId, Year, Value
FROM table t
WHERE
    ItemId = 1
    AND ItemVersionId = (SELECT MAX(ItemVersionId) FROM table WHERE ItemId = t.ItemId AND Year = t.Year)

Whilst this returns the correct I suspect there is a more efficient way to do it, especially when the table gets large.

I am using SQL Server 2005.

Thanks in advance

A: 

I think it's okay how you do it. You could check if there is a composite index on ItemId and Year.

You could inspect the query plan to see the impact of that query.

If there is an "Item" table in your database you could try another approach. Insert a column ItemVersionId in that table and make sure you update that value when new versions are saved. Then in your query join the Item table using ItemId and ItemVersionId instead of using that subquery.

splattne
I wholeheartedly suggest that you DO NOT add an ItemVersionID to your Item table. That completely violates standard normalization and will most likely cause a lot of headaches for you in the future.
Tom H.
I agree, but only partially. In some cases it is not acceptable having big performance losses or locks on the db because of religiously followed normailzation "standards". http://www.codinghorror.com/blog/archives/001152.html
splattne
+2  A: 

I would do it with a CTE:

WITH Result AS
(
  SELECT Row_Number() OVER (PARTITION BY ItemId, Year
ORDER BY ItemversionId DESC) AS RowNumber
      ,ItemId
      ,ItemversionId
      ,Year
      ,Value
  FROM table
)
SELECT ItemId
  ,ItemversionId
  ,Year
  ,Value
FROM Result
WHERE RowNumber = 1
ORDER BY ItemId, Year
Bliek
After checking execution plans between this and the original query, this code indeed out-perform. Estimated subtree cost of the first is .026 and this query is .014.
jons911
Thanks for this, this is the type of thing I was looking for.I will do some performance testing and post back the results.
jheppinstall
A: 

Does this work?

SELECT ItemId, ItemVersionId, Year, MAX(Value) FROM table GROUP BY ItemId, ItemVersionId, Year

StingyJack
It's not about the MAX(Value), but about the Value connected to MAX(VersionId).
Tomalak
A: 

This should work, although you will have to test performance with your own data:

SELECT
    T1.ItemID,
    T1.ItemVersionID,
    T1.Year,
    T1.Value
FROM
    MyTable T1
INNER JOIN (SELECT Year, MAX(ItemVersionID) AS MaxItemVersionID FROM MyTable T2 WHERE T2.ItemID = 1 GROUP BY Year) SQ ON
    SQ.Year = T1.Year AND
    SQ.MaxItemVersionID = T1.ItemVersionID
WHERE
    T1.ItemID = 1

Also, you can alter the subquery to also group by and return an ItemID so that you can return data for more than one item at a time if you need to for some other part of your application. Just be sure to then add the ItemID to the join criteria.

Tom H.