views:

560

answers:

3

Hello,

I'm currently developing an application to allow students to manage their courses, and I don't really know how to design the database for a specific feature. The client wants, a lot like Facebook, that when a student displays the list of people currently in a specific course, the people with the most mutual courses with the logged in user are displayed first. Almost the same as Facebook feature "Friend suggestions" with an additional filter.

As an additional feature, I would like to add a search feature to allow students to search for another one, and displaying first in the search results the people with most mutual courses with the logged in user.

I currently use MySQL, I plan to use Cassandra for some other features, and I also use Memcached for result caching and Sphinx for the search.

Thanks.

--

The app is developed in Python, BTW

And I forgot to mention that the standard approach (using a nice MySQL query to calculate all this with an ORDER BY clause) is wayyyys too slow. So as reads are a lot more frequent than reads, I would like most of the logic to take place once, when the relation people <-> course is added.

I thought about updating a "mutual courses" counter specific to one tuple (user, course) that will be increased for all users of a course when the logged in user joins a new course (or decreased when he leaves it).

+2  A: 

Say you have a table that is named Users and the Primary Key is UserID. Then you have a table called Friends with 2 columns called UserID (PK) and FriendUserID.

Say you have 2 users, 20 and 50.

When 20 adds 50 as friend, the application adds a new row:

INSERT INTO `Friends` (`UserID`, `FriendUserID`) VALUES (20, 50)

and when 50 confirms friendship, you add another row with values switched:

INSERT INTO `Friends` (`UserID`, `FriendUserID`) VALUES (50, 20)

When you want to find mutual friends between 20 and 50, simply:

SELECT `UserID` FROM `Friends` AS `A`, `Friends` AS B WHERE `A`.`FriendUserID` = 20 AND `A`.`UserID` = `B`.`UserID` AND `B`.`FriendUserID` = 50
thephpdeveloper
Oh yes thanks. That works.But that's definitely not a viable solution, for performance issues. Even if that request is made when a new relation takes place to calculate and store the results, this would be ways too slow with a pretty big table.
Pierre
+2  A: 

If you already have your solution, but the problem is just the speed of that query, try doing it sooner. When a user's friendships change, rerun a job that calculates these things and store all the results away. Don't runt his as a result of a request, when you need the result so quickly. Do such expensive things only once and do them before a request is ever made.

ironfroggy
So, when a new relationship is added, do what I thought about: update a counter for mutual relationships (in Cassandra) for each user already in the course.When displaying the results, just take the data from Cassandra, display the results as they come (already ordered), and display other users (with no relationships) after... -- My only concern would still be performance. Wouldn't it be a huge overhead when joining a course with thousands of people in it?
Pierre
Depends on if you do "for x in y: updatesql(newvalue)" or "UPDATE counter = counter + 1 where ...". The first is going to hit network overhead. The second should be pretty fast.
somori
Yep. If done in SQL, I totally agree with you. But the performance problem is still (a little) alive. If I talk about a course with 100.000 people in it (yes, there is!), wouldn't the query `SELECT u.id FROM users u INNER JOIN relationships r ON (u.id == r.user_id_1 OR u.id == r.user_id_2) ORDER BY r.mutual_counter LIMIT ...` be a little expensive? If I use Cassandra, I can store already ordered data. But as far as I know, I have to loop through the relationships manually :/
Pierre
Or maybe (not decreasing the time necessary, but improving the user experience) by delegating this work to other servers, for example with Gearman?
Pierre
A: 

I would break this up as (2) queries and find the intersection in Python:

#Query 1 - Get the user's friends
SELECT friend_id FROM friends WHERE user_id = 'my user id'

#Query 2 - Get the users enrolled in the course
SELECT student_id FROM course_enrollment WHERE course_id = 'course id'

Then find the intersection in Python. Then you can let the database do caching, etc ... without any joins to slow things down.

ensnare