tags:

views:

141

answers:

5

I am quite certain we cannot use the LIMIT clause for what I want to do - so wanted to find if there are any other ways we can accomplish this.

I have a table which captures which user visited which store. Every time a user visits a store, a row is inserted into this table.

Some of the fields are

  • shopping_id (primary key)
  • store_id
  • user_id

Now what I want is - for a given set of stores, find the top 5 users who have visited the store max number of times.

I can do this 1 store at a time as:

select store_id,user_id,count(1) as visits 
from shopping 
where store_id = 60
group by user_id,store_id 
order by visits desc Limit 5

This will give me the 5 users who have visited store_id=60 the max times

What I want to do is provide a list of 10 store_ids and for each store fetch the 5 users who have visited that store max times

select store_id,user_id,count(1) as visits 
from shopping 
where store_id in  (60,61,62,63,64,65,66)
group by user_id,store_id 
order by visits desc Limit 5
This will not work as the Limit at the end will return only 5 rows rather than 5 rows for each store.

Any ideas on how I can achieve this. I can always write a loop and pass 1 store at a time but wanted to know if there is a better way

+1  A: 

UNION may be what you are looking for.

-- fist store
(select store_id,user_id,count(1) as visits 
from shopping 
where store_id = 60
group by user_id,store_id 
order by visits desc Limit 5)
UNION ALL
-- second store
(select store_id,user_id,count(1) as visits 
from shopping 
where store_id = 61
group by user_id,store_id 
order by visits desc Limit 5)
...

http://dev.mysql.com/doc/refman/5.0/en/union.html

Yada
A: 

The simplest way would be to issue 10 separate queries, one for each store. If you use parameterized queries (e.g. using PDO in PHP) this will be pretty fast since the query will be part-compiled.

If this still proves to be too resource-intensive, then another solution would be to cache the results in the store table - i.e. add a field that lists the top 5 users for each store as a simple comma-separated list. It does mean your database would not be 100% normalised but that shouldn't be a problem.

DisgruntledGoat
Hmmm thats an interesting idea - so you are suggesting - everytime a user visits a store - at that time itself do a check and update the top 5 users column. I thought of that approach but then decided against it thinking why to store data permanently that can be retrieved with some simple SQL
Gublooo
@Gubloo: I meant that you would run a script (as a cron job, or for the first user who visits after midnight, say) which would do the updates. What you said may still be too resource-intensive. You're right though - if you can get the data with a simple query then that's certainly better, but it looks like your accepted answer is pretty far from simple ;)
DisgruntledGoat
:) - well although that query looks complex - I'm not sure how resource intensive it is - but atleast its a simple way of getting the results on the fly.
Gublooo
+1  A: 

Major concern over here is number of times you query a database. If you query multiple times from your script. Its simply wastage of resources and must be avoided. That is you must NOT run a loop to run the SQL multiple times by incrementing certain value. In your case 60 to 61 and so on.

Solution 1: Create a view Here is the solution

CREATE VIEW myView AS
select store_id,user_id,count(1) as visits 
from shopping 
where store_id = 60
group by user_id,store_id 
order by visits desc Limit 5
UNION
select store_id,user_id,count(1) as visits 
from shopping 
where store_id = 61
group by user_id,store_id 
order by visits desc Limit 5
UNION
select store_id,user_id,count(1) as visits 
from shopping 
where store_id = 62
group by user_id,store_id
order by visits desc Limit 5 

Now use

SELECT * from MyView

This is limited because you cant make it dynamic. What if you need 60 to 100 instead of 60 to 66.

Solution 2: Use Procedure. I wont go into how to write a procedure cuz its late night and I got to sleep. :) Well, procedure must accept two values 1st inital number (60) and 2nd Count (6) Inside the procedure create a temporary table (cursor) to store data then run a loop from initial number till count times In your case from 60 to 66 Inside the loop write desired script Replacing 60 with a looping variable.

select store_id,user_id,count(1) as visits 
from shopping 
where store_id = 60
group by user_id,store_id 
order by visits desc Limit 5

And append the result in the temporary table (cursor).

Hope this will solve your problem. Sorry I couldn't give you the code. If you still need it plz send me a message. I will give it to you when I wake up next morning.

RAHUL PRASAD
+1  A: 

If you will not save data about when a user visited a store or something like this, you could simply update the table each time a user visits a store instead of appending a new row.

Something like this:

INSERT INTO `user_store` (`user_id`, `store_id`, `visits`) VALUES ('USER', 'SHOP', 1)
ON DUPLICATE KEY UPDATE `visits` = `visits` + 1

But I think this would not work, because neither user_id nor store_id are unique. You have to add a unique primary key like this: user#store or something else.

Another opinion would be to save this data (how often a user was in a store) in a separate table containing of ID, user_id, store_id, visits and increment visits everytime you also add a new row to you existing table.

To get the Top5 you can then use:

SELECT `visits`, `user_id` FROM `user_store_times` WHERE `store_id`=10 ORDER BY `visits` DESC LIMIT 5
Tobias
Thanx for ON DUPLICATE KEY UPDATE `visits` = `visits` + 1thing.
RAHUL PRASAD
+2  A: 

Using two user variable and counting the same consecutive store_id, you can replace <= 5 with whatever limit you want

SELECT a.*
FROM (
 SELECT store_id, user_id, count(1) as visits 
 FROM shopping
 WHERE store_id IN (60,61,62,63,64,65,66)
 GROUP BY store_id, user_id
 ORDER BY store_id, visits desc, user_id
) a,
(SELECT @prev:=-1, @count:=1) b
WHERE
 CASE WHEN @prev<>a.store_id THEN
   CASE WHEN @prev:=a.store_id THEN
    @count:=1
   END
 ELSE
   @count:=@count+1
 END <= 5

Edit as requested some explanation :

The first subquery (a) is the one that group and order the data so you will have data like:

store_id | user_id | visits
---------+---------+-------
 60           1       5
 60           2       3
 60           3       1
 61           2       4
 61           3       2

the second subquery (b) init the user variable @prev with -1 and @count with 1

then we choose all data from the subquery (a) verifying the condition in the case.

  • verify that the previous store_id (@prev) we have seen is different from the current store_id. Since the first @prev is equal to -1 there is nothing that match the current store_id so the condition <> is true we enter then is the second case who just serve to change the value @prev with the current store_id. This is the trick so i can change the two user variable @count and @prev in the same condition.

  • if the previous store_id is equal to @prev just increment the @count variable.

  • we check that the count is within the value we want so the <= 5

So with our test data the:

step | @prev | @count | store_id | user_id | visits
-----+-------+--------+----------+---------+-------
  0      -1      1    
  1      60      1        60          1        5 
  2      60      2        60          2        3
  3      60      3        60          3        1
  4      61      1        61          2        4
  5      61      2        61          3        2   
Patrick
Hey Patrick - I have never seen a query like this :) but it works perfectly. I tried with my test data and it gives the exact results I was looking for. I'll have to now study what this query is doing. Thank you so much
Gublooo
Hey Patrick - I have never seen a query like this :)Please explain the code. So that we can learn from it.I will be thankful.
RAHUL PRASAD
@Gubloo, @RAHUL PRASAD added some explanation
Patrick
Thankyou patrick - that explanation was very clear and helpful. Appreciate it.
Gublooo