ansaurus

Question

Answer 1

+2 A:

Hello,

One method you can recommend which achievements to go for is to see how many of your users already have those achievements and recommend those popular ones. When they have achieved those you go down the list and recommend slightly less popular ones. However, this has a naive assumption that everyone wants to go for popular achievements. It might cause popular achievements to be even more popular and less popular ones, well... A consolation is that this doesn't take up much resources and is likely to run very fast. (Just keep a list of achievements + number of times it's achieved)

Another method (which attempts to guess which achievements the user is likely to go after based on what achievements he already had) is to use some machine learning algorithms. I think the k-nearest neighbor algorithm will perform quite well here. Select a threshold and just output everything that is above this threshold. Now, I don't know if this will run faster than what you already have, but you should just run the recommendation engine once every time the user has made a new achievement, store the top (let's say) five, and just output it back to the user whenever a recommendation is needed.

I hope this helps. =)

wai 2009-07-04 08:52:03

The first is a little too simplistic to me as there are many types of users getting different achievements. Any recommendations on a "fast" (non-bruteforce) kNN python library or code? All I found was an unanswered usenet post http://coding.derkeiler.com/Archive/Python/comp.lang.python/2007-08/msg01615.html

Paul Tarjan 2009-07-04 09:00:45

How about improving the first method by having different general groups of users (and keeping a list for each type)? Like in MMORPGs, they have achievements based on whether the person is an explorer, a power-player (power-leveler?), a social person etc... You can guess which general category they fall on based on a few of the achievements they already had. Unfortunately, kNN is not going to be fast, other than the simplest of cases. However, I think it would give great recommendation and you can use it to help create lists for the first method. Sorry, I don't know Python, yet.

wai 2009-07-04 09:15:02

That seems similar to clustering the aliases offline, and then showing them achievements from their nearest cluster and then walk down the clusters in order of proximity. Maybe i'll see if I can code that up and how fast it will be.

Paul Tarjan 2009-07-06 03:56:27

I guess you're worried about the thing slowing your server down while you are generating the recommendations. Just an idea, how about performing this task on another machine (like an old laptop on a backup of live database) then have it upload those data into your server database? You can set a cron for it to run every 24 hours. That way, the performance of your web server is relatively unaffected and your list can be relatively up-to-date. Good luck! =)

wai 2009-07-06 13:22:10

Answer 2

+1 A:

I would suggest that you do the first three steps (achievements, otherAliases, count) as one single SQL statement. As it is now, you are issuing a lot of queries and summarising thousands of rows in Python which is a task you should delegate to the DB. For example the code

for otherAlias in otherAliases : #For every single other user
    for otherAch in otherAlias.achievements.all() : #execute a query
        r[otherAch] = r.get(otherAch, 0) + 1

Does thousands of huge queries.

Instead, you can use SQL to do this by joining Achiever on itself based on Alias id being different and achievement id being the same. You then group by achievement id and run a count.

In the query below, the table "B" is other user's achievements and "Achiever" is our achievements. If any other user shares an achievement, they appear once in "B" for each achievement they share. We then group those by alias_id and count the number of times they appeared so you get a nice id, count table out.

Very very rough code (no SQL available here)

SELECT B.Alias_id, COUNT(B.achievement_id) 
  FROM Achiever, Achiever as B 
  WHERE Achiever.achievement_id == B.achievement_id 
     AND Achiever.Alias_id == <insert current user alias here>;
  GROUP BY B.Alias_id

If that works the way I think it will, you will get a table of other user aliases, along with the number of achievements they share with the current user.

The next thing you do is an SQL statement that uses the one above as an "inner select" - call it users. You join that with your achievements table and your Achiever table for the current user. You might want to ignore all but the top 10 users who are similar to the current user.

I don't have time to write up a good query right now, but look at the JOIN statement for your DB that joins on achievement_id between the nominated 10 users and the current user - setting that id to NULL if it doesn't exist. The filter only to rows where it turned up NULL (unachieved achievements).

Tom Leys 2009-07-06 04:18:10

Yes, this is exactly how I would do it if I was coding in PHP with an SQL connector. Is there a good way to do this in a django object model, or should I just fall back to SQL?

Paul Tarjan 2009-07-06 05:44:06

I would follow the instructions in http://docs.djangoproject.com/en/dev/topics/db/sql/ to execute raw SQL and then encapsulate that in a method in User or Alias - This code is far too performance critical to do in Python (or any web layer) since it suits the DB so well.

Tom Leys 2009-07-06 10:27:00

ansaurus

tags:

views:

answers:

How to recommend the next achievement

related questions