views:

143

answers:

4

Take the case of two tables: tbl_product and tbl_transaction.
tbl_product lists product details including names and ids while tbl_transaction lists transactions involving the products and includes dates, product-ids, customers etc.

I need to display a web-page showing 10 products and for each product, the last 5 transactions. So far, no LEFT JOIN query seems to work and the subquery below would have worked if mysql could allow the tx.product_id=ta.product_id part (fails with Unknown column 'ta.product_id' in 'where clause': [ERROR:1054]).

SELECT
ta.product_id,
ta.product_name,
tb.transaction_date
FROM tbl_product ta
LEFT JOIN (SELECT tx.transaction_date FROM tbl_transaction tx WHERE tx.product_id=ta.product_id LIMIT 5) tb LIMIT 10

Is there a way to achieve the listing I need without using multiple queries in a loop?

Edit:
This is exactly what I need from MySQL:
SELECT ta.product_id, ta.product_name, tb.transaction_date ...
FROM tbl_product ta
LEFT JOIN tbl_transaction tb ON (tb.product_id=ta.product_id LIMIT 5)
LIMIT 10

Of course this is illegal, but I really wish it wasn't!

+1  A: 

You need to pull the join condition out of your subquery. Try:

SELECT
    ta.product_id,
    ta.product_name,
    tb.transaction_date
FROM tbl_product ta
LEFT JOIN (SELECT tx.transaction_date 
           FROM tbl_transaction tx  
           ORDER BY tx.transaction_date DESC
           LIMIT 5) tb 
ON ta.product_id=tb.product_id
LIMIT 10
chryss
This one works but only LIMITs the the first product.
eddy edu
This does not work because your subquery is only returning the latest 5 transactions across ALL transactions. He stated "for each product, the last 5 transactions".
nategood
+1  A: 

It fails because when you put parenthesis around your query and give it the alias "tb" you have created a derived table. Your derived table has no knowledge of the tbl_product table having alias "ta"

Try using ON or WHERE outside of the derived table, then reference that table using the alias" tb" you provided

EDIT:

The use of "LIMIT" limits the results of the query in its entirety. While you have "LIMIT 10" what you actually want is 50 rows (or less if there are fewer than 5 historical), is that right?

10 products, joined to 5 historical records, returning 50 total rows.

In this case, you can have a single query solution be joining the product table to itself in a derived table having "LIMIT 10"

Such as:

SELECT *
FROM tbl_product ta
JOIN (SELECT * FROM tbl_product tz WHERE tz.product_id = ta.product_id LIMIT 10) tc
LEFT JOIN (SELECT tx.transaction_date FROM tbl_transaction tx 
    WHERE tx.product_id=ta.product_id LIMIT 5) tb

You could also us "in" and specify the top 10 such as:

SELECT *
FROM tbl_product ta
LEFT JOIN (SELECT tx.transaction_date FROM tbl_transaction tx 
WHERE tx.product_id=ta.product_id LIMIT 5) tb 
WHERE ta.product_id IN 
  (SELECT z.product_id FROM tbl_product z LIMIT 10)
Matthew PK
After modifying (removing the tbl_product reference in the sub-query and adding it in an ON clause) the query works OK but the LIMIT seems only to apply to the first product i.e. I get a list of 10 product with the first one having 5 transactions but all the others have none.
eddy edu
Again the answer is because if your derived table.The "LIMIT" in the derived table is unrelated to the "LIMIT" in the entire query.How about some sample data?
Matthew PK
Edited my answer to better explain
Matthew PK
A: 

Edit:

If you want to enforce a limit on each product I would separate it into two queries.

First get a list of 10 products and then run another query for each of those returning the last five transactions.

Rosco
This one fails with a cryptic error that is only fixed by having an ON clause. And after the 'fix', the result-set only contains the first product with 5 transactions - other products are ignored.
eddy edu
I considered that but I'm sure there's a one-query solution to this! There just has to be!
eddy edu
+1  A: 

This is where ranking functions would be very useful. Unfortunately, MySQL does not yet support them. Instead, you can try something like the following.

Select ta.product_id, ta.product_name
    , tb.transaction_date
From tbl_product As ta
    Left Join   (
                Select tx1.product_id, tx1.transaction_id, tx1.transaction_date
                    , (Select Count(*)
                        From tbl_transaction As tx2
                        Where tx2.product_id = tx1.product_id
                            And tx2.transaction_id < tx1.transaction_id) As [Rank]
                From tbl_transaction As tx1
                ) as tb
        On tb.product_id = ta.product_id
            And tb.[rank] <= 4
Limit 10
Thomas
Thanks for the response, Thomas.Yes, `tbl_transaction` has an AUTO_INCREMENT column as the primary key. I tried your proposal (although I don't quite understand it fully) and I got a list of the products but **without any** transactions. Why the self-join (on tbl_transaction [tx >-< tx2])? Could you please take me through the logic of the sub-query?
eddy edu
@eddy edu - The gist of it is to impose a ranking on the transactions by product. To that end, I compare the transaction table against itself to find the number of transactions for a given product earlier than a given row. The earliest row for a product should produce a count of zero. Now, if you are getting odd results, I have a couple of questions. First, can transaction date be null? Second, can transaction date be duplicated (two rows for the same product_id and date)?
Thomas
transaction_date cannot be NULL. transaction_date can be duplicated (one product can have multiple transactions for a given date). I get the need to rank transactions by product (so that I can 'group' my transactions by products), I just need to see how the date column comes in.
eddy edu
@eddy edu - Ok. I've updated my post. Instead of looking for Count(Distinct tx2.Id), I changed it to `Count(Distinct tx2.transaction_date)`. In short, I'm using transaction_date to rank the transaction rows. I do that by finding the number of other transaction rows with an earlier date. The earliest date for a given product should produce zero.
Thomas
@eddy edu - It occurs to me that you might run into a problem with gaps in the transaction dates. Am working on a solution now.
Thomas
@eddy edu- Posted another update. (Can't speak to the performance). In short, I'm calculating a rank for each product-transaction_date combination. I then filter for those whose rank is less than or equal to four (i.e., the top 5).
Thomas
I tried that too but I still get the same results - listing each product but no transactions.
eddy edu
@eddy edu - I take it that the inner-most query, which is only against tbl_transaction, produces results, correct? What about just the `tb` derived table query? Does that produce results?
Thomas
After adding a missing column in the field list, the tb derived table query returns a list of three transactions each with a rank of zero.
eddy edu
@eddy edu - Hm. What happens if you change `Z1.transaction_date < Z2.transaction_date` to `Z1.transaction_date > Z2.transaction_date`?
Thomas
Seems to work the same as the other one - returns one transaction per product each ranked zero.
eddy edu
@eddy edu - How many rows does the inner most query return (the one that simply groups by product_Id, transaction_date)?
Thomas
I've stripped a production db so that I have tbl_product with 3 rows and tbl_transaction with 53 rows; 22 for product-1, 9 for product-2 and 22 for product-3 (makes my work easier and faster). The inner most query returns 3 rows.
eddy edu
@eddy edu - Btw, do you care which 5 transaction rows are returned for a given product? I.e., if the first five rows for a given product all have the same transaction date, does it matter for the purposes of your output?
Thomas
@Thomas - The date does not really matter. I just need the last five transactions ordered descending by transaction_id (which seemed too trivial to include in the question because if I can fetch at most five transactions for each product, I can easily order them as I want). Do you have any other suggestions because I've resigned to the fact that the listing I need cannot be accomplished in a one-query SQL so I might need to fetch the products first then throw transaction-getter-sqls or just work with one transaction per product (although I really didn't want either). Thanks for your input.
eddy edu
@eddy edu - Posted another revision. Knowing that you simply want five transactions whether the dates are duplicated or not makes all the difference. Now we can rank on that auto_increment column `transaction_id` you mentioned in an earlier comment. One day soon, MySQL will have ranking functions and when it does, it will be easier to revise a single query for performance than a loop I would think.
Thomas
@Thomas - This actually works now. Thanks a lot! Exactly the way I wanted it to. Now I just need a few more days to figure out why it's working :D I cannot up-vote you enough!
eddy edu