views:

107

answers:

4

I have a table that logs page views of each user:

+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| view_id      | int(11)      | NO   | PRI | NULL    | auto_increment | 
| page_id      | int(11)      | YES  | MUL | NULL    |                | 
| user_id      | int(11)      | YES  | MUL | NULL    |                | 
+--------------+--------------+------+-----+---------+----------------+

For every pair of users, I would like to generate a count of how many pages they have both looked at.

I simply do not know how to do this. : ) I am using mysql, in case it has a non-standard feature that makes this a breeze.

+2  A: 
SELECT DISTINCT page_id
FROM logtable
WHERE user_id = 1 OR user_id = 2
GROUP BY page_id
HAVING COUNT(DISTINCT user_id) = 2

This table returns all pages they both have looked at. If you want the count, then just make this a subquery and count the rows.

SELECT COUNT(*) FROM (the query above) s;

Update, let's do it for all pairs of users then.

SELECT u1.user_id, u2.user_id, COUNT(DISTINCT u1.page_id)
FROM logtable u1, logtable u2
WHERE u1.user_id < u2.user_id
  AND u1.page_id = u2.page_id
GROUP BY u1.user_id, u2.user_id
Tor Valamo
are you sure you are answering the question?
Lukman
The OP wants to see pages viewed by both users.
danben
@Lukman @danben - Edited and tested it now, it should work.
Tor Valamo
This doesn't address the 'for every pair of users' aspect though, right?
Rob Farley
@Rob - correct, but I don't see the use of doing that, ever... The OP sounds like a lookup for a particular situation, like a "how much do you have in common with this person" kind of thing.
Tor Valamo
No, but I can imagine something along the lines of "Is there another user who visits a lot of the same pages as user 123?".
Rob Farley
@Rob, @Tor: like Rob pointed out, I do mean 'for every [unordered] pair of users'. In my code I could issue the query Tor proposes for every pair, but I imagined (possibly mistakenly?) that doing it in one query would be more efficient than issuing (literally) millions of queries.
laramichaels
@Rob - Such a query should take into consideration that the other user may also visit tons of other pages, so a percentage is more appropriate.
Tor Valamo
@laramichaels1978 - I updated it to include a count for all pairs of users (user pairs listed by (low, high) ids).
Tor Valamo
Aha, you've turned it into my answer. Thanks for the compliment, Tor.
Rob Farley
@Rob - I noticed that after I posted... but you're welcome. And mine is nicer. And yours gives an error. Where's waldo? :P
Tor Valamo
Bad of me to leave some ambiguity... serves me right for not running it. ;)
Rob Farley
Ouch... I tweaked it to fix the query, and one of my upvoters removed their vote. *sniff*
Rob Farley
I'll give you one for having the same great mind then.
Tor Valamo
+3  A: 
select u1.user_id, u2.user_id, count(distinct u1.page_id) as NumPages
from logtable u1
  join
  logtable u2
  on u1.page_id = u2.page_id
  and u1.user_id < u2.user_id /* This avoids counting pairs twice */
group by u1.user_id, u2.user_id;

But you should consider filtering this somewhat...

(Edited above to put u1.page_id, it was originally just page_id, which is really bad of me)

Rob Farley
@Rob, thank you! After some brief testing this seems to do exactly what I was looking for. Just a note: mysql complained about page_id being ambiguous, so I replaced [following Tor below] 'count(distinct page_id)' with 'count(distinct u1.page_id)'. Many thanks. : )
laramichaels
@Rob,I misclicked and removed my earlier upvote. For me to be able to add it back you need to make some minor edit (add a word or so?) to your answer. : )
laramichaels
@Rob, again thanks for your answer. : ) I am getting a result set that contains only 300k records. With 10k different users, I was expecting a much results larger table [ (10k**2-10k)/2) ~ 55m records]. Since no pairs are reported with NumPages=0, I am guessing that the query is only listing those pairs which share at least one page -- is this correct? Or am I doing something wrong?
laramichaels
That's correct. The join only considers pairs of users that have a common page.
Rob Farley
@Rob: many thanks for your extraordinary help. : )
laramichaels
A: 
select a.user_id as user1, b.user_id as user2, count(distinct a.page_id) as views
from yourtable a, yourtable b
where a.page_id = b.page_id 
and a.user_id < b.user_id
group by a.user_id, b.user_id

change yourtable to the name of your table ..

Lukman
count(*) doesn't work here, because the users might visit the same page repeatedly. If they've both visited a page three times, it'll count that page too many times. If the table only records the first visit on each page, then this'll be fine.
Rob Farley
@Rob: fixed by using count(distinct a.page_id)
Lukman
A: 

For users_ids 100 and 200.

SELECT
  page_id
FROM table1
WHERE user_id IN (100, 200)
GROUP BY page_id
HAVING MAX(CASE WHEN user_id = 100 THEN 1 ELSE 0 END) = 1
  AND MAX(CASE WHEN user_id = 200 THEN 1 ELSE 0 END) = 1;
lins314159