views:

58

answers:

1

I'm now using slope One for recommendation.

How to exclude visited items from result?

I can't do it simply by not in (visited_id_list) to filter those visited ones because it will have scalability issue for an old user!

I've come up with a solution without not in

select b.property,count(b.id) total from propertyviews a
                                         left join propertyviews b on b.cookie=a.cookie
                                         left join propertyviews c on c.cookie=0 and b.property=c.property
                                         where a.property=1 and a.cookie!=0 and c.property is null
                                         group by b.property order by total;
A: 

Seriously, if you are using MySQL, look at 12.2.10.3. Subqueries with ANY, IN, and SOME

For example:

SELECT s1 FROM t1 WHERE s1 IN    (SELECT s1 FROM t2);

This is available in all versions of MySQL I looked at, albeit that the section numbers in the manual are different in the older versions.

EDIT in response to the OP's comment:

  1. OK ... how about something like SELECT id FROM t1 WHERE ... AND NOT id IN (SELECT seen_id FROM user_seen_ids where user = ? ). This form avoids having to pass thousands of ids in the SQL statement.

  2. If you want to entirely avoid the "test against a list of ids" part of the query, I don't see how it is even possible in theory, let alone how you would implement it.

Stephen C
What I'm worrying about is its performance against such queries:`select * ... where ... and id not in (100_thousand_ids)`
I think there might exist,in theory.
On what basis do you think this? Do you have some hints as to how it might possibly work?
Stephen C
I've come up with a solution by myself.I use cookie to identify each user.
And how do you make a cookie scalable? There are limits on the amount of data you can store in a cookie.
Stephen C
You might want to look into "NOT EXISTS" for MySQL... but profile the heck out of it. It has problems with correlated subqueries, but should use indexes instead of a full table scan. Also, if you can supply a LIMIT sorted by ranking, you might be able to avoid unnecessary comparisons. Keeping your left hand side at a lower row length than the right is often advantageous. NOT EXISTS, with the proper schema and a small left-hand-side will be quite fast.
Pestilence
@Stephen C ,each user get a unique cookie.
@Pestilence ,how?I've only used `exist` this way:`drop table if exists ..`
@Unknown - you miss my point. If you assume 20 cookies per host/domain and 4Kb size (see RFC 2956), and 7 digit ids, that is a maximum of 10,000 ids you can store in a user's cookies. And of course, the user's browser sends up to 80Kb of headers in every request that your server has to process. I would not call that scalable!!
Stephen C
Oh,you missed my point.`cookie` is not the visited items,but a single unique id for that user.
So how does that avoid the query with the 1000's of ids? This is not making any sense to me.
Stephen C
"NOT EXISTS", in some SQL engines, is the same as "NOT IN" but will use indexes for the operation. It's often faster, but MySQL has "limitations."
Pestilence