views:

133

answers:

2

I'm using SQL Server 2005.

Let's say I have a table for products and another table for prices so that I can track price changes over time. I need a query that fetches distinct products (easy part) plus each product's most recent price and the date it changed.

Products Table:

CREATE TABLE [dbo].[Products](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [Name] [varchar](50) NOT NULL,
    [Price] [money] NOT NULL,
 CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED 
( [ID] ASC )
) ON [PRIMARY]
GO

INSERT INTO Products (Name, Price) VALUES ('Hat', 10);
INSERT INTO Products (Name, Price) VALUES ('Shirt', 15);
INSERT INTO Products (Name, Price) VALUES ('Pants', 20);
INSERT INTO Products (Name, Price) VALUES ('Coat', 25);
INSERT INTO Products (Name, Price) VALUES ('Shoes', 30);

PriceChanges Table:

CREATE TABLE [dbo].[PriceChanges](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [ProductId] [int] NOT NULL,
    [Price] [money] NOT NULL,
    [PriceChanged] [datetime] NOT NULL,
 CONSTRAINT [PK_PriceChanges] PRIMARY KEY CLUSTERED 
( [ID] ASC )
) ON [PRIMARY]
GO

INSERT INTO PriceChanges (ProductId, Price, PriceChanged) VALUES (1, 9.65, '1/1/2010');
INSERT INTO PriceChanges (ProductId, Price, PriceChanged) VALUES (1, 10.10, '1/2/2010');
INSERT INTO PriceChanges (ProductId, Price, PriceChanged) VALUES (1, 11.50, '1/3/2010');
INSERT INTO PriceChanges (ProductId, Price, PriceChanged) VALUES (2, 15.50, '1/4/2010');
INSERT INTO PriceChanges (ProductId, Price, PriceChanged) VALUES (2, 15.65, '1/5/2010');
INSERT INTO PriceChanges (ProductId, Price, PriceChanged) VALUES (3, 19.95, '1/6/2010');
INSERT INTO PriceChanges (ProductId, Price, PriceChanged) VALUES (4, 24.95, '1/7/2010');

Here's a query that returns too much:

SELECT     
    p.ID ProductId, 
    p.Name, 
    COALESCE(c.Price, p.Price) Price, 
    c.PriceChanged
FROM dbo.Products AS p LEFT JOIN 
        dbo.PriceChanges AS c ON c.ProductId = p.ID

Returns:

1   Hat     9.65    2010-01-01 00:00:00.000
1   Hat     10.10   2010-01-02 00:00:00.000
1   Hat     11.50   2010-01-03 00:00:00.000
2   Shirt   15.50   2010-01-04 00:00:00.000
2   Shirt   15.65   2010-01-05 00:00:00.000
3   Pants   19.95   2010-01-06 00:00:00.000
4   Coat    24.95   2010-01-07 00:00:00.000
5   Shoes   30.00   NULL

What I need it to return is:

1   Hat     11.50   2010-01-03 00:00:00.000
2   Shirt   15.65   2010-01-05 00:00:00.000
3   Pants   19.95   2010-01-06 00:00:00.000
4   Coat    24.95   2010-01-07 00:00:00.000
5   Shoes   30.00   NULL

This query works, but it's got two nested selects and it will make DBA's around the world weep:

SELECT     
    p.ID ProductId, 
    p.Name, 
    COALESCE((SELECT TOP 1 Price FROM dbo.PriceChanges WHERE ProductId = p.ID ORDER BY PriceChanged DESC), p.Price) Price, 
    (SELECT TOP 1 PriceChanged FROM dbo.PriceChanges WHERE ProductId = p.ID ORDER BY PriceChanged DESC) PriceChanged
FROM dbo.Products AS p 

What's a better way to do this?

+3  A: 

Use:

   SELECT p.id AS productid,
          p.name,
          COALESCE(x.price, p.price) AS price,
          x.pricechanged
     FROM dbo.PRODUCTS p
LEFT JOIN (SELECT pc.productid,
                  pc.price,
                  pc.pricechanged,
                  ROW_NUMBER() OVER(PARTITION BY pc.productid 
                                        ORDER BY pc.pricechanged DESC) AS rownum
             FROM dbo.PRICECHANGES pc) x ON x.productid = p.id
                                        AND x.rownum = 1
OMG Ponies
Works like a charm.
Byron Sommardahl
+1 it's cool, but the old styled works faster (on this data, of course). ;)
igor
+3  A: 

It executes 2 time faster than [OMG Ponies] one and in 20 times faster than your one (on this data)

select
    p.ID ProductId
    ,p.Name
    ,COALESCE(b.Price, p.Price) Price
    ,b.PriceChanged
from dbo.Products AS p 
LEFT JOIN 
(
    select
      a.ProductId
      ,a.PriceChanged
      ,pc2.Price     
    from
    (
        select 
           pc1.ProductId
           ,MAX(pc1.PriceChanged) as PriceChanged
        from dbo.PriceChanges pc1 
        group by pc1.ProductId 
    ) a         
    inner join dbo.PriceChanges pc2 
    on (a.PriceChanged = pc2.PriceChanged and a.ProductId = pc2.ProductId)
) b
ON b.ProductId = p.ID  
igor
+1: It's faster because using `ROW_NUMBER()` means parsing all records in `dbo.PRICECHANGES`
OMG Ponies