views:

148

answers:

5

I have a table called order which contains columns id, user_id, price and item_id. Item prices aren't fixed and I would like to select each item's most expensive order. I want to select user_id, item_id and price in the same query. I tried the following query but it doesn't return the correct result set.

SELECT user_id, item_id, MAX(price)
FROM order
GROUP BY item_id

Some of the rows returned by this query have the wrong user_id. However, all rows in the result set show each item's correct highest price.

+1  A: 

Your query groups the rows by item_id. If you have multiple items with item_id 1, with different a user_id, it will only pick the first user_id, not the user_id with the highest price.

Lekensteyn
Yeah, that's right. So how do I achieve what I'm trying to do here? I'm trying to find out who bought the item at the greatest price and what that price was.
Omer Hassan
A: 

You'll either need to group by item_id AND user_id (showing the max price per item per user), or if you want just the item in the group you'll need to rethink the user_id column. e.g. show the max price for an item and show the LAST user who made a change on the price, OR show the Max price for an item and show the user who MADE the Max Price for the item etc. Have a look at this post for some patterns for doing this.

nonnb
Can't I get the item, its maximum price and the user who made that price in one query?
Omer Hassan
+2  A: 

You need to first get the maximum price for each item id and then join back to order to get records where the item was ordered for the maximum price. Something like the following query should work. Although, it will return all records with the maximum item prices.

SELECT user_id, item_id, price
FROM order o
JOIN (
        SELECT item_id, max(price) max_price
        FROM order
        GROUP BY item_id
     ) o2 
  ON o.item_id = o2.item_id AND o.price = o2.max_price;
ar
+2  A: 

You may want to use a derived table, as follows:

SELECT    o1.item_id, o1.max_price, o2.user_id user_of_max_price
FROM      (
             SELECT item_id, MAX(price) max_price
             FROM `order`
             GROUP BY item_id
          ) o1
JOIN      `order` o2 ON (o2.price = o1.max_price AND o2.item_id = o1.item_id)
GROUP BY  o1.item_id;

Test case:

CREATE TABLE `order` (user_id int, item_id int, price decimal(5,2));

INSERT INTO `order` VALUES (1, 1, 10);
INSERT INTO `order` VALUES (1, 2, 15);
INSERT INTO `order` VALUES (1, 3, 8);
INSERT INTO `order` VALUES (2, 1, 20);
INSERT INTO `order` VALUES (2, 2, 6);
INSERT INTO `order` VALUES (2, 3, 15);
INSERT INTO `order` VALUES (3, 1, 18);
INSERT INTO `order` VALUES (3, 2, 13);
INSERT INTO `order` VALUES (3, 3, 10);

Result:

+---------+-----------+-------------------+
| item_id | max_price | user_of_max_price |
+---------+-----------+-------------------+
|       1 |     20.00 |                 2 |
|       2 |     15.00 |                 1 |
|       3 |     15.00 |                 2 |
+---------+-----------+-------------------+
3 rows in set (0.00 sec)
Daniel Vassallo
It worked perfectly. Thanks, Daniel!
Omer Hassan
+1  A: 

This is a per-group-maximum question. There are various approaches to this common problem. On MySQL it's typically faster and simpler to use a null-self-join than anything involving subqueries:

SELECT o0.user_id, o0.item_id, o0.price
FROM order AS o0
LEFT JOIN order AS o1 ON o1.item_id=o0.item_id AND o1.price>o0.price
WHERE o1.user_id IS NULL

ie. “select each row where there exists no other row for the same item with a higher price”.

(If two rows have the same maximum price you will get both returned. What exactly to do in the case of a tie is a general problem for per-group-maximum solutions.)

bobince
@bobince: Doesn't the benchmark in [the link you provided](http://kristiannielsen.livejournal.com/6745.html) show that the derived table (uncorrelated subquery) method is much faster than the null-self-join? ... I also used to think that null-self-join is slightly faster in MySQL, and in fact I'm very surprised with those benchmarks. I have a feeling I'm going to do some tests myself :) ... +1 anyway
Daniel Vassallo
Yeah, results are of course going to vary depending on the size of tables and indexes involved. I've typically found the null-self-join fastest over my particular dataset in the past using MySQL (whose subquery support is known to be relatively young so perhaps not as optimised as it could be). It would be interesting to investigate more with a recent version of MySQL.
bobince