views:

58

answers:

2

I have a social network set up and via an api I want to search the entries. The database of the social network is mysql. I want the search to return results in the following format: Results that match the query AND are friends of the user performing the search should be prioritized over results that simply match the query.

So can this be done in one query or will I have to do two separate queries and merge the results and remove duplicates?

I could possibly build up a data structure using Lucene and search that index efficiently, but am wondering if the penalty of updating a document everytime a new relationship is created is going to be too much?

Thanks

+1  A: 

The reference to Lucene complicates the equation a little bit. Let's solve it (or at least get a baseline) without it first.

Assuming the following datamodel (or something approaching.

tblUsers
  UserId  PK
  UserName
  Age
  ...

tblBuddies
  UserId     FK to tblUsers.UserId
  FriendId   tblUsers.Userid  = Id of one of the friends
  BuddyRating     float 0.0 to 1.0 (or whatever normalized scale) indicating 
                  the level of friendship/similarity/whatever

tblItems
  ItemId  PK
  ItemName
  Description
  Price
  ...

tblUsersToItems
   UserId   FK to tblUsers.UserId
   ItemId   FK to 
   ItemRating   float 0.0 to 1.0 (or whatever normalized scale) indicating 
                the "value" assigned to item by user.

A naive query (but a good basis for an optimized one) could be:

SELECT [TOP 25]  I.ItemId, ItemName, Description, SUM(ItemRating * BuddyRating)
FROM tblItems I
LEFT JOIN tblUserToItems UI ON I.ItemId = UI.ItemId
LEFT JOIN tblBuddies B ON UI.UserId = B.FriendId
WHERE B.UserId = 'IdOfCurrentUser'
  AND SomeSearchCriteria -- Say ItemName = 'MP3 Player'
GROUP BY I.ItemId, ItemName, Description
ORDER BY SUM(ItemRating * BuddyRating) DESC

The idea is that a given item is given more weight if it is recommended/used by a friend. The extra weigh is the more important if the friend is a a close friend [BuddyRating] and/or if the friend recommend this item more strongly [ItemRating]

Optimizing such a query depends on the overal number of item, the average/max numbers of buddies a given user has, the average/max number of items a user may have in his/her list.

Is this type of ideas/info you are seeking or am I missing the question?

mjv
+1  A: 

One way is to store all your social network graph separately from Lucene. Run your keyword query on Lucene, and also lookup all the friends in your network graph. For all the friends that are returned, boost all of those friends' search results by some factor and resort. This re-sort would be done outside of Lucene. I've done things like this before and it performs pretty well.

You can also create a custom HitCollector that does the boosting as the hits are being collected in Lucene. You'd have to construct a list of internal Lucene ID's that belong to the friends of the current user.

Your social network graph can be stored in Mysql, in memory as a sparse adjacency matrix, or you can take a look at Neo4j.

bajafresh4life