I am building a Reddit clone in Erlang. I am considering using some erlang web frameworks but this is not the problem.
I am having a problem selecting a database.
How it works;
I have multiple dedicated reddits. Examples, science, funny, corporate, sport. You could consider them sub reddits. Each sub reddit has categories.
A user can post the following info:
Title, Category Tags, Description, Category, Future Date,
and Add picture, link. video
As with Reddit, users will be able to vote on the stories and comment. Comments will also have vote system.
How the problem;
I dont know what NoSQL database to use, the site will have scalability problems with Mysql (trust me it will so dont suggest sql) There will be around 10,000-20,000 concurrent connections if not more.
Now what I need;
1) A user will go to the sporting subreddit,
They will want to see all stories with a Future Date, for example NFL category, or Soccer world cup category they might want to see all stories with future dates which indicate upcoming games or events.
But since people might post crap, i need to say sort by Future date, but then filter the results by posts with more then 5 votes, Then i need to show the closest upcoming event.
So if there is a game on the weekend and the next game is 3 weeks again the closest game needs to come up first.
2) so the problem above, is using one database
1) Find all posts in subreddit: Sport. 2) Find al posts in NFL category. 3) Find all posts with future date. Sort these posts by most votes and display stories with closest date to today.
I think couchdb looks like a good candidate, but i am not sure
but what about Cassandra, Hbase, Riak, neo4j?
I am going crazy trying to figure this out.
I need something that will scale and handle a large amount of users.
Please help, thanks