views:

2501

answers:

8

I have a table containing prices for a lot of different "things" in a MS SQL 2005 table. There are hundreds of records per thing per day and the different things gets price updates at different times.

ID uniqueidentifier not null,
ThingID int NOT NULL,
PriceDateTime datetime NOT NULL,
Price decimal(18,4) NOT NULL

I need to get today's latest prices for a group of things. The below query works but I'm getting hundreds of rows back and I have to loop trough them and only extract the latest one per ThingID. How can I (e.g. via a GROUP BY) say that I want the latest one per ThingID? Or will I have to use subqueries?

SELECT * 
FROM Thing
WHERE ThingID IN (1,2,3,4,5,6)
  AND PriceDate > cast( convert(varchar(20), getdate(), 106) as DateTime)

UPDATE: In an attempt to hide complexity I put the ID column in a an int. In real life it is GUID (and not the sequential kind). I have updated the table def above to use uniqueidentifier.

+10  A: 

I think the only solution with your table structure is to work with a subquery:

SELECT *
   FROM Thing
   WHERE ID IN (SELECT max(ID) FROM Thing 
                   WHERE ThingID IN (1,2,3,4)
                   GROUP BY ThingID)

(Given the highest ID also means the newest price)

However I suggest you add a "IsCurrent" column that is 0 if it's not the latest price or 1 if it is the latest. This will add the possible risk of inconsistent data, but it will speed up the whole process a lot when the table gets bigger (if it is in an index). Then all you need to do is to...

SELECT *
   FROM Thing
   WHERE ThingID IN (1,2,3,4)
     AND IsCurrent = 1

UPDATE

Okay, Markus updated the question to show that ID is a uniqueid, not an int. That makes writing the query even more complex.

SELECT T.* 
   FROM Thing T
   JOIN (SELECT ThingID, max(PriceDateTime)
            WHERE ThingID IN (1,2,3,4)
            GROUP BY ThingID) X ON X.ThingID = T.ThingID 
                                AND X.PriceDateTime = T.PriceDateTime
   WHERE ThingID IN (1,2,3,4)

I'd really suggest using either a "IsCurrent" column or go with the other suggestion found in the answers and use "current price" table and a separate "price history" table (which would ultimately be the fastest, because it keeps the price table itself small).

(I know that the ThingID at the bottom is redundant. Just try if it is faster with or without that "WHERE". Not sure which version will be faster after the optimizer did its work.)

BlaM
The 'join' query is much, much faster than the one with 'select' embedded in where ('join' does one extra select, 'where' does one for every entry!). Could you edit your anwser to indicate this?
skolima
@skolima: How would you suggest that I replace the subquery with a join. I don't think that that's possible, because I need a aggregate function "in the middle".
BlaM
Well I don't know a way to get totally rid of subquery. However, in your "update complex response", the subquery gets executed once. In your first query, the subquery is called as many times as there are items in `Thing`. At least on MySQL.
skolima
Please see http://beta.stackoverflow.com/questions/53670/how-to-efficiently-sql-select-neweset-entries-from-a-mysql-database (duplicate of this question).
skolima
+2  A: 

I would try something like the following subquery and forget about changing your data structures.

SELECT
 *
FROM
 Thing
WHERE 
 (ThingID, PriceDateTime) IN 
 (SELECT 
   ThingID, 
   max(PriceDateTime ) 
  FROM 
   Thing 
  WHERE 
   ThingID IN (1,2,3,4)
  GROUP BY 
   ThingID
 )

Edit the above is ANSI SQL and i'm now guessing having more than one column in a subquery doesnt work for T SQL. Marius, I can't test the following but try;

SELECT
 p.*
FROM
 Thing p,
 (SELECT ThingID, max(PriceDateTime ) FROM Thing WHERE ThingID IN (1,2,3,4) GROUP BY ThingID) m
WHERE 
 p.ThingId = m.ThingId
 and p.PriceDateTime = m.PriceDateTime

another option might be to change the date to a string and concatenate with the id so you have only one column. This would be slightly nasty though.

Mark Nold
Mark, I tried this but SQL Sercver complains abt the where clause. I didn't know you could do "WHERE (ThingID, PriceDateTime) IN ..."?
Marius
+1  A: 

It depends on the nature of how your data will be used, but if the old price data will not be used nearly as regularly as the current price data, there may be an argument here for a price history table. This way, non-current data may be archived off to the price history table (probably by triggers) as the new prices come in.

As I say, depending on your access model, this could be an option.

Galwegian
+2  A: 

If the subquery route was too slow I would look at treating your price updates as an audit log and maintaining a ThingPrice table - perhaps as a trigger on the price updates table:

ThingID int not null,
UpdateID int not null,
PriceDateTime datetime not null,
Price decimal(18,4) not null

The primary key would just be ThingID and "UpdateID" is the "ID" in your original table.

Nick Pierpoint
+1  A: 

I'm converting the uniqueidentifier to a binary so that I can get a MAX of it. This should make sure that you won't get duplicates from multiple records with identical ThingIDs and PriceDateTimes:

SELECT * FROM Thing WHERE CONVERT(BINARY(16),Thing.ID) IN
(
 SELECT MAX(CONVERT(BINARY(16),Thing.ID))
  FROM Thing
  INNER JOIN
   (SELECT ThingID, MAX(PriceDateTime) LatestPriceDateTime FROM Thing
 WHERE PriceDateTime >= CAST(FLOOR(CAST(GETDATE() AS FLOAT)) AS DATETIME)
 GROUP BY ThingID) LatestPrices
  ON Thing.ThingID = LatestPrices.ThingID
   AND Thing.PriceDateTime = LatestPrices.LatestPriceDateTime
 GROUP BY Thing.ThingID, Thing.PriceDateTime
) AND Thing.ThingID IN (1,2,3,4,5,6)
BirgerH
+1  A: 

Since ID is not sequential, I assume you have a unique index on ThingID and PriceDateTime so only one price can be the most recent for a given item.

This query will get all of the items in the list IF they were priced today. If you remove the where clause for PriceDate you will get the latest price regardless of date.

SELECT * 
FROM Thing thi
WHERE thi.ThingID IN (1,2,3,4,5,6)
  AND thi.PriceDateTime =
     (SELECT MAX(maxThi.PriceDateTime)
      FROM Thing maxThi
      WHERE maxThi.PriceDateTime >= CAST( CONVERT(varchar(20), GETDATE(), 106) AS DateTime)
        AND maxThi.ThingID = thi.ThingID)

Note that I changed ">" to ">=" since you could have a price right at the start of a day

Peter
+2  A: 

Since you are using SQL Server 2005, you can use the new (CROSS|OUTTER) APPLY clause. The APPLY clause let's you join a table with a table valued function.

To solve the problem, first define a table valued function to retrieve the top n rows from Thing for a specific id, date ordered:

CREATE FUNCTION dbo.fn_GetTopThings(@ThingID AS GUID, @n AS INT)
  RETURNS TABLE
AS
RETURN
  SELECT TOP(@n) *
  FROM Things
  WHERE ThingID= @ThingID
  ORDER BY PriceDateTime DESC
GO

and then use the function to retrieve the top 1 records in a query:

SELECT *
   FROM Thing t
CROSS APPLY dbo.fn_GetTopThings(t.ThingID, 1)
WHERE t.ThingID IN (1,2,3,4,5,6)

The magic here is done by the APPLY clause which applies the function to every row in the left result set then joins with the result set returned by the function then retuns the final result set. (Note: to do a left join like apply, use OUTTER APPLY which returns all rows from the left side, while CROSS APPLY returns only the rows that have a match in the right side)

BlaM: Because I can't post comments yet( due to low rept points) not even to my own answers ^^, I'll answer in the body of the message: -the APPLY clause even, if it uses table valued functions it is optimized internally by SQL Server in such a way that it doesn't call the function for every row in the left result set, but instead takes the inner sql from the function and converts it into a join clause with the rest of the query, so the performance is equivalent or even better (if the plan is chosen right by sql server and further optimizations can be done) than the performance of a query using subqueries), and in my personal experience APPLY has no performance issues when the database is properly indexed and statistics are up to date (just like a normal query with subqueries behaves in such conditions)

Pop Catalin
I have mainly worked with SQL Server 2000, so this is a new concept to me. Technically, will that create a temporary table with all rows and then run the "function" over the temporary table? How does that compare in speed to my "JOINed" solution?
BlaM
Thanks for your explanation. I think I like this technique ;)
BlaM
A: 

Try this (provided you only need the latest price, not the identifier or datetime of that price)

SELECT ThingID, (SELECT TOP 1 Price FROM Thing WHERE ThingID = T.ThingID ORDER BY PriceDateTime DESC) Price
FROM Thing T
WHERE ThingID IN (1,2,3,4) AND DATEDIFF(D, PriceDateTime, GETDATE()) = 0
GROUP BY ThingID
MobyDX