views:

499

answers:

5

I'm trying to generate a sales reports which lists each product + total sales in a given month. Its a little tricky because the prices of products can change throughout the month. For example:

  • Between Jan-01 and Jan-15, my company sells 50 Widgets at a cost of $10 each
  • Between Jan-15 and Jan-31, my company sells 50 more Widgets at a cost of $15 each
  • The total sales of Widgets for January = (50 * 10) + (50 * 15) = $1250

This setup is represented in the database as follows:

Sales table
  Sale_ID    ProductID    Sale_Date
  1          1            2009-01-01
  2          1            2009-01-01
  3          1            2009-01-02
             ...
  50         1            2009-01-15
  51         1            2009-01-16
  52         1            2009-01-17
             ...
  100        1            2009-01-31

Prices table
  Product_ID    Sale_Date    Price
  1             2009-01-01   10.00
  1             2009-01-16   15.00

When a price is defined in the prices table, it is applied to all products sold with the given ProductID from the given SaleDate going forward.

Basically, I'm looking for a query which returns data as follows:

Desired output
  Sale_ID    ProductID    Sale_Date     Price
  1          1            2009-01-01    10.00
  2          1            2009-01-01    10.00
  3          1            2009-01-02    10.00
             ...
  50         1            2009-01-15    10.00
  51         1            2009-01-16    15.00
  52         1            2009-01-17    15.00
             ...
  100        1            2009-01-31    15.00

I have the following query:

SELECT
    Sale_ID,
    Product_ID,
    Sale_Date,
    (
        SELECT TOP 1 Price
        FROM Prices
        WHERE
            Prices.Product_ID = Sales.Product_ID
            AND Prices.Sale_Date < Sales.Sale_Date 
        ORDER BY Prices.Sale_Date DESC
    ) as Price
FROM Sales

This works, but is there a more efficient query than a nested sub-select?

And before you point out that it would just be easier to include "price" in the Sales table, I should mention that the schema is maintained by another vendor and I'm unable to change it. And in case it matters, I'm using SQL Server 2000.

A: 

The combination of Product_ID and Sale_Date is your foreign key. Try a select-join on Product_ID, Sale_Date.

Eduard Wirch
It's not that simple -- not every date that a sale can be made has a corresponding row in the Prices table. To find the price for a given date, you need to find the **most recent price whose Sale_Date is at or before** the given date.
j_random_hacker
A: 

Are you actually running into performance problems or are you just anticipating them? I would implement this exactly as you have, were my hands tied from a schema-modification standpoint as yours are.

Sean Bright
+2  A: 

It's well to avoid these types of correlated subqueries. Here's a classic technique for such cases.

SELECT  
    Sale_ID,  
    Product_ID,  
    Sale_Date,  
    p1.Price  
FROM Sales AS s 
LEFT JOIN Prices AS p1 ON s.ProductID = p1.ProductID  
    AND s.Sale_Date >= p1.Sale_Date  
LEFT JOIN Prices AS p2 ON s.ProductID = p2.ProductID  
    AND s.Sale_Date >= p2.Sale_Date  
    AND p2.Sale_Date > p1.Sale_Date  
WHERE p2.Price IS NULL  -- want this one not to be found

Use a left outer join on the pricing table as p2, and look for a NULL record demonstrating that the matched product-price record found in p1 is the most recent on or before the sales date.

(I would have inner-joined the first price match, but if there is none, it's nice to have the product show up anyway so you know there's a problem.)

le dorfier
I think you meant to get rid of the top-1 subquery in there. As is, you have 2 FROMs. Also your conditions are the same, so if p1 matches, p2 will match.
SquareCog
Right, thanks, I cut too little and pasted too much. Fixed. And got that other date comparison in the second outer join. Ulp.
le dorfier
Nice query. I would personally make the second join into a "WHERE NOT EXISTS" subquery if I knew that the optimizer would treat it the same way (PostgreSQL does) as that's makes your intention a bit clearer I think.
j_random_hacker
+2  A: 

If you start storing start and end dates, or create a view that includes the start and end dates (you can even create an indexed view) then you can heavily simplify your query. (provided you are certain there are no range overlaps)

SELECT
    Sale_ID,
    Product_ID,
    Sale_Date,
    Price
FROM Sales
JOIN Prices on Sale_date > StartDate and Sale_Date <= EndDate  
-- careful not to use between it includes both ends

Note:

A technique along these lines will allow you to do this with a view. Note, if you need to index the view, it will have to be juggled around quite a bit ..

create table t (d datetime)

insert t values(getdate())
insert t values(getdate()+1)
insert t values(getdate()+2)

go
create view myview 
as
select start = isnull(max(t2.d), '1975-1-1'), finish = t1.d  from t t1
left join t t2 on t1.d > t2.d
group by t1.d

select * from myview 

start                   finish
----------------------- -----------------------
1975-01-01 00:00:00.000 2009-01-27 11:12:57.383
2009-01-27 11:12:57.383 2009-01-28 11:12:57.383
2009-01-28 11:12:57.383 2009-01-29 11:12:57.383
Sam Saffron
The downside is that you are explicitly storing what really is a derived date, creating redundancy. And complexity is added to maintain it. Imagine changing the date value ...
le dorfier
@ie dorfer, see the view solution ... it abstracts this stuff away and gives you a clean set of dates to join to ... to make it an indexed view it is a little bit on the tricky side.
Sam Saffron
+1 for suggesting a view. I'm not sure how it is possible to "index" a view -- is that something SQL Server-specific?
j_random_hacker
yerp, indexed views are sql server specific, but they come with a big pile of restrictions, so it takes a fair amount of work to get them going.
Sam Saffron
I found this technique was pretty straightforward and the easy to implement. Thanks :)
Juliet
A: 

I agreee with Sean. The code you have written is very clean and understandable. If you are having performance issues, then take the extra effort to make the code faster. Otherwise, you are making the code more complex for no reason. Nested sub-selects are extremely useful when used judiciously.

Jersey Dude