ansaurus

Question

How to store sets, to find similar patterns fast?

Answer 1

A:

CREATE TABLE data (user INTEGER, movie INTEGER, rate INTEGER);

SELECT  other.user, AVG(ABS(d1.rate - d2.rate)) AS distance
FROM    data me, data other
WHERE   me.user = :user
    AND other.user <> me.user
    AND other.movie = me.movie
GROUP BY
    other.user
ORDER BY
    distance

Complexity will be O(n^1.5)) rather than O(n²), as there will be n comparisons to sqrt(n) movies (average of movies filled together by each pair).

Quassnoi 2009-01-20 18:58:05

This looks a bit too easy. I think the complexity of this query would be about O(n²). Except the database does some magic, that I am asking for - right here. :) Thank you anyay for your suggestion. (small fix: "AND other.user <> me.user")

MrFox 2009-01-20 19:47:30

There are three parameters, the number of people (say n), the number of movies (say m) and the probability that a person fills out a particular movie (say p). This algo is then O(n * m * p^2) expected time, assuming movies are filled out independently (or higher if some movies are more popular).

j_random_hacker 2009-01-21 13:42:52

Answer 2

+3 A:

Looks like you are looking for the nearest neighbor in the movie space. And your distance function is the L1 metric. You can probably use a spatial index of some kind. Maybe you can use techniques from collaborative filtering.

Yuval F 2009-01-20 19:04:02

You are right. This is a kind of nearest neighbor with some L1 distance function. And I have already considered a spatial index like z-order or oct-tree. But this would imply a large table (thought) with almost empty cells. Any spacial index would perform bad on such an empty table.

MrFox 2009-01-20 19:39:57

Answer 3

+3 A:

Sounds a lot like the Netflix Prize challenge, more specifically the first half of the most popular approach. The possible implementations of what you are trying to do are numerous and varied. None of them are exceptionally efficient, and the L1 metric is not a particularly good option for reliable correlations.

Sparr 2009-01-20 20:18:51

ansaurus

tags:

views:

answers:

How to store sets, to find similar patterns fast?

related questions