ansaurus

Question

Need help with a complex Join statement in SQL.

Answer 1

+2 A:

SELECT v.date, p.price, v.volume
FROM volume v
LEFT JOIN Price p ON p.itemID=v.itemID
    AND p.[date] = (
                    SELECT MAX([date] )
                    FROM price p2 
                    WHERE p2.[date] <= v.[date] AND p2.itemid= v.itemid
                    GROUP BY p2.[date]
                   )

Joel Coehoorn 2008-12-18 17:29:34

won't work -- will return volume only on the days when there is a new price. desired output is the volume should be returned for the price on that day or the nearest previous day for which there is a new price.

ʞɔıu 2008-12-18 17:35:34

Answer 2

A:

 SELECT  Volume.date, volume.itemid, price.price, volume.volume_amt
 FROM Volume 
 LEFT OUTER JOIN Price
 ON Volume.date = Price.date

Probably. My SQL-fu is weak

Paul 2008-12-18 17:32:07

Nope, that would give him blanks for price in the rows that don't have exact matches.

Dave Costa 2008-12-18 17:35:07

Ah, gotcha. We're doing share trades or something and want last-traded-price. <thinks>

Paul 2008-12-18 17:37:30

Answer 3

+3 A:

Assuming there is only 1 price per date/itemid:

select v.date, v.itemid, p.price
from volume v
join price p on p.itemid = v.item_id
where p.date = (select max(p2.date) from price p2
                where p2.itemid = v.itemid
                and p2.date <= v.date);

Tony Andrews 2008-12-18 17:34:03

Same answer as mine, but I'll admit that you got it right faster even if mine shows that I posted sooner.

Joel Coehoorn 2008-12-18 17:39:21

actually I believe this contains a mistake: the p.date condition belongs in the join condition, not in the where clause.

ʞɔıu 2008-12-18 17:49:22

Does it make any difference?

Tony Andrews 2008-12-18 17:52:18

Not in this case, but it would if you were doing an outer join.

Bill Karwin 2008-12-18 17:55:56

my comment about the p.date comparison belonging in the where or on clauses was not correct; it doesn't make a difference.

ʞɔıu 2008-12-18 18:06:49

Answer 4

A:

This method works in Oracle. Don't know about other databases, and you didn't specify. If this exact syntax doesn't work in your database, I would guess there are similar techniques.

dev> select * from price;

AS_OF               ID     AMOUNT
----------- ---------- ----------
04-Dec-2008          1          1
11-Dec-2008          1          2
15-Dec-2008          1          3

dev> select * from volume;

DAY                 ID     VOLUME
----------- ---------- ----------
05-Dec-2008          1          1
06-Dec-2008          1          2
07-Dec-2008          1          3
08-Dec-2008          1          4
09-Dec-2008          1          5
10-Dec-2008          1          6
11-Dec-2008          1          7
12-Dec-2008          1          8
13-Dec-2008          1          9
14-Dec-2008          1         10
15-Dec-2008          1         11
16-Dec-2008          1         12
17-Dec-2008          1         13
18-Dec-2008          1         14
19-Dec-2008          1         15
20-Dec-2008          1         16
21-Dec-2008          1         17
22-Dec-2008          1         18
23-Dec-2008          1         19

dev> select day, volume, amount from (
  2    select day, volume, (select max(as_of) from price p where p.id = v.id and as_of <= day) price_as_of
  3      from volume v
  4  )
  5  join price on as_of = price_as_of
  6  order by day;

DAY             VOLUME     AMOUNT
----------- ---------- ----------
05-Dec-2008          1          1
06-Dec-2008          2          1
07-Dec-2008          3          1
08-Dec-2008          4          1
09-Dec-2008          5          1
10-Dec-2008          6          1
11-Dec-2008          7          2
12-Dec-2008          8          2
13-Dec-2008          9          2
14-Dec-2008         10          2
15-Dec-2008         11          3
16-Dec-2008         12          3
17-Dec-2008         13          3
18-Dec-2008         14          3
19-Dec-2008         15          3
20-Dec-2008         16          3
21-Dec-2008         17          3
22-Dec-2008         18          3
23-Dec-2008         19          3

Dave Costa 2008-12-18 17:34:25

Answer 5

+7 A:

This isn't as simple as a single LEFT OUTER JOIN to the sparse table, because you want the NULLs left by the outer join to be filled with the most recent price.

EXPLAIN SELECT v.`date`, v.volume_amt, p1.item_id, p1.price
FROM Volume v JOIN Price p1
  ON (v.`date` >= p1.`date` AND v.item_id = p1.item_id)
LEFT OUTER JOIN Price p2
  ON (v.`date` >= p2.`date` AND v.item_id = p2.item_id
    AND p1.`date` < p2.`date`)
WHERE p2.item_id IS NULL;

This query matches Volume to all rows in Price that are earlier, and then uses another join to make sure we find only the most recent price.

I tested this on MySQL 5.0.51. It uses neither correlated subqueries nor group by.

edit: Updated the query to match to item_id as well as date. This seems to work too. I created an index on (date) and an index on (date, item_id) and the EXPLAIN plan was identical. An index on (item_id, date) may be better in this case. Here's the EXPLAIN output for that:

+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref             | rows | Extra                                |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
|  1 | SIMPLE      | p1    | ALL  | item_id       | NULL    | NULL    | NULL            |    6 |                                      | 
|  1 | SIMPLE      | v     | ref  | item_id       | item_id | 22      | test.p1.item_id |    3 | Using where                          | 
|  1 | SIMPLE      | p2    | ref  | item_id       | item_id | 22      | test.v.item_id  |    1 | Using where; Using index; Not exists | 
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+

But I have a very small data set, and the optimization may depend on larger data sets. You should experiment, analyzing the optimization using a larger data set.

edit: I pasted the wrong EXPLAIN output before. The one above is corrected, and shows better use of the (item_id, date) index.

Bill Karwin 2008-12-18 17:48:54

Looks good so far, What if we assume there could be more than one possible item_id, just stick v.item_id=p1.item_id and v.item_id=p2.item_id in the on clauses, right? Also, if you're really feeling up to it, is it better to add indexes on (item_id, date) or (date, item_id)?

ʞɔıu 2008-12-18 18:14:55

ansaurus

tags:

views:

answers:

Need help with a complex Join statement in SQL.

related questions