views:

46

answers:

3

Suppose each Person has a collection of favorite Books.

So I have a table for:

  • Person
  • Book
  • The association between Person and Book (joint table for MxN)

I want to fetch the Persons that are similar to a Person1 based on the favorite Books overlaping. That is: The more books they have in common, the more they are similar.

I don't have to use only SQL to solve this problem. I could use programming also. I'm using SQL Server 2008 and C#.

What solution would you experts use?

+1  A: 

The problem you are describing is usually referred to as "collaborative filtering" and tackled using "recommender systems". Googling for either of those terms should lead you to plenty of useful information.

earl
+2  A: 

This may not be the most efficient, but it's relatively simple:

WITH SimlarBookPrefs(person_id, similar_person_id, booksInCommon) AS
(
 Select p1.person_id, p2.person_id AS simlar_person_id,   
 /* Find the number of books p1 and p2 have in common */
   (SELECT COUNT(*) FROM PersonBook pb1, PersonBook pb2 
     JOIN pb1=book_id=pb2.book_id
   WHERE pb1.person_id=p1.person_id AND pb2.person_id=p2.person_id) As BooksInCommon
   FROM Person p1 CROSS JOIN Person p2
)

This will give you for each person, a list of other persons and the number books in common.

To get the most similar person, add (in the same query)

SELECT TOP 1 similar_person_id FROM SimilarBookPrefs 
   WHERE person_id = <person_to_match>
   ORDER By booksInCommon DESC;

The first part does not have to be a CTE (i.e. WITH ...) it can be a view or even a derived table. It'a a CTE here for brevity.

mdma
+1  A: 

If I were doing this in C#, I might tackle it like this

var query = from personBook in personBooks
            where personBook.PersonId != basePersonId // ID of person to match
            join bookbase in personBooks
            on personBook.BookId equals bookbase.BookId
            where bookbase.PersonId == basePersonId // ID of person to match
            join person in persons 
            on personBook.PersonId equals person.Id 
            group person by person into bookgroup
            select new
            {
                Person = bookgroup.Key, 
                BooksInCommon = bookgroup.Count()
            };

This could likely be done with the entity framework or Linq to SQL, or simply translated into SQL directly.

Full sample code

class CommonBooks
{
    static void Main()
    {
        List<Person> persons = new List<Person>()
        {
            new Person(1, "Jane"), new Person(2, "Joan"), new Person(3, "Jim"), new Person(4, "John"), new Person(5, "Jill")
        };

        List<Book> books = new List<Book>()
        {
            new Book(1), new Book(2), new Book(3), new Book(4), new Book(5)
        };

        List<PersonBook> personBooks = new List<PersonBook>()
        {
            new PersonBook(1,1), new PersonBook(1,2), new PersonBook(1,3), new PersonBook(1,4), new PersonBook(1,5), 
            new PersonBook(2,2), new PersonBook(2,3), new PersonBook(2,5), 
            new PersonBook(3,2), new PersonBook(3,4), new PersonBook(3,5), 
            new PersonBook(4,1), new PersonBook(4,4),
            new PersonBook(5,1), new PersonBook(5,3), new PersonBook(5,5)
        };

        int basePersonId = 4; // person to match likeness

        var query = from personBook in personBooks
                    where personBook.PersonId != basePersonId
                    join bookbase in personBooks
                    on personBook.BookId equals bookbase.BookId
                    where bookbase.PersonId == basePersonId
                    join person in persons
                    on personBook.PersonId equals person.Id
                    group person by person into bookgroup
                    select new
                    {
                        Person = bookgroup.Key,
                        BooksInCommon = bookgroup.Count()
                    };

        foreach (var item in query)
        {
            Console.WriteLine("{0}\t{1}", item.Person.Name, item.BooksInCommon);
        }

        Console.Read();
    }
}

class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
    public Person(int id, string name) { Id = id; Name = name; }
}

class Book
{
    public int Id { get; set; }
    public Book(int id) { Id = id; }
}

class PersonBook
{
    public int PersonId { get; set; }
    public int BookId { get; set; }
    public PersonBook(int personId, int bookId) { PersonId = personId; BookId = bookId; }
}
Anthony Pegram