views:

295

answers:

3

This question is inspired by the article "Why are Facebook, Digg, and Twitter so hard to scale?" on highscalability.com

So what database systems(however obscure) are out there that would be able to handle this type of data better?

Thanks for you help!

+1  A: 

The article indirectly told you the answer when it mentioned memcached. This is a key-value store which keeps all of its data in RAM. Obviously you have to have additional data stores that keep data on hard drives, but they are probably also key-value stores. There are lots of these like Hadoop, CouchDB, Tokyo Cabinet and Redis.

You can also use a column store such as MonetDB where you only have to retrieve the fields that you are interested in, not whole table rows.

Michael Dillon
+3  A: 

Check the NOSQL debrief, it has interesting resources on several distributed, non relational databases:

Presentation slides and videos
Intro session - Todd Lipcon, Cloudera (slides, video1, video2)
Voldemort - Jay Kreps, Linkedin (slides pdf ppt, video1, video2)
Cassandra - Avinash Lakshman, Facebook (slides pdf ppt, video)
Dynomite - Cliff Moon, Powerset (slides, video)
HBase - Ryan Rawson, Stumbleupon (slides, video)
Hypertable - Doug Judd, Zvents (slides pdf ppt, video1, video2)
CouchDB - Chris Anderson, couch.io (slides, video1, video2)

VPork - Jon Travis, Springsource (slides, video)
MongoDb - Dwight Merriman, 10gen (slides, video)
Infinite Scalability - Jonas S Karlsson, Google (slides, video)

Some videos by Digg's John Quinn, the rest by Martin Dittus from Last.fm. Pictures by Russ Garrett from Last.fm.

For the links to the slides and videos, check the original page, there are just too many of them to paste.

You might want to read NoSQL: If Only It Was That Easy too (and even the Nosql entry on wikipedia).

Pascal Thivent
+5  A: 

Having a database system where the data model is tailored for the data structure you are trying to represent is often advantageous. Social networks lend themselves very well to Graph databases, such as Allegro Graph, Neo4j etc.

There is a good article at the Neo4j blog on how to represent social networks in a graph database, with the examples using Neo4j.

The benefit of graph databases is that data is stored so that traversing connections in between entities is a very fast operation, allowing you to traverse complex networks quickly. These operations would typically be (at best) expensive join operations in current implementations of relational databases. As with relational databases, graph databases still have a slight problem with scaling out to multiple hardware nodes. However the need for multiple hardware nodes should be much less with a graph database than with a relational database for Social Network kinds of data, a few billion nodes on a single machine is no problem. Scaling out to multiple hardware nodes is where key-value stores shine, since entities in a key-value store completely isolated from each other. The problem here is instead that nothing is isolated in a social network, meaning that to emulate the connections multiple queries to the database are required, one for each entity. This will be slow, especially for friend-of-a-friend kinds of queries, where you only discover one level of friends with each query.

Disclaimer: I am a member of the Neo4j team.

thobe