Hi folks,
I am in the process of building/architecting a business social network web application that has a component that I think will lead to major scalability issues and I'd like to get some feedback/thoughts on the best way forward.
The application has a User object. The idea is, that every time a new user joins the system he ranks everyone else's "usefulness" to him based on a set of factors. Similarly, every other user on the system ranks him/her.
However, I'm worried about the scalability implications of this approach. For example, if 10,000 users join the system we are talking about 10,000^2 calculations to be stored to the database. That is 100 million records so that clearly becomes problematic both in terms of time taken to calculate these rankings but also in terms of storing this in a database.
Thus, I'm looking for help/inspiration :)
My background is in java and I've been looking at hadoop/map-reduce as a possible way to implement the calculations in a parallel manner, however I really am not sure whether this problem is applicable to Map Reduce or as to what is the best approach in general.
So, I suppose there are two specific parts to my query..
1) To do the actual calculations, should I do these in a parallel manner, ie..is Map Reduce a good approach for this problem
2) To store the rankings, what should I be using...is a standard relational database a bad idea, ie...this won't be a good fit for MySQL...should I look at something like Cassandra, HBase or some other NoSQL solution?
Any help/ideas is greatly appreciated.
cheers, Brian