views:

253

answers:

3

I am wondering what the best ways to store graphs in persistent storage are, for later analysis, search, clustering, etc.

I see neo4j being an option, I am curious if there are also other graph databases available. Does anyone have any insights into how larger social networks store their graph based data (or other sites that require the storage of graph like models, e.g. RDF).

What about options like Cassandra, or MySQL?

A: 

You could look at InfiniteGraph, which will be released for beta very soon (http://www.infinitegraph.com/)

If this is for commercial use then you'll see it's targeted towards sites that will have larger graphs. The social networking sites built custom solutions, which worked for them at the time. But they're in-house solutions are more limiting than using something like InfiniteGraph. Products like Cassandra or MySQL weren't designed for this many-to-many problem set. Can you do it? Sure, but it's a lot of hand-written coding, and not scalable. Let us know if you have a real project, we could help you figure out you graph requirements. Thanks, Warren [email protected]

Warren
A: 
  1. WebGraph is a framework to study the web graph. From their page - "It provides simple ways to manage very large graphs, exploiting modern compression techniques."

  2. This blog post - "On Building a Stupidly Fast Graph Database" - provides some guidelines on building a graph database - the technique they use is "memory-mapped I/O, disk-based linear-hashing".

    It also touches upon the problems they had using other data storage techniques - viz. SQL-based databases, etc.

    Graph sizes listed: "2.5 million nodes and 60 million edges in 12 minutes on a MacBook"

Note: I have not actually tried WebGraph, though. Will be interesting to know your experiences in the comments.

Bart J
A: 

Disclaimer: I am speaking form the graph analysis standpoint.

There are several file formats for storing graph data: GraphML, GXL and several others. But storage usually is not a problem. Working with the graphs without fully loading them into RAM is the tricky part.

The RDF model is too generic to do serious graph analysis stuff. If you don't mind your analysis being slow and programming the algorithms yourself, go with the existing graph databases - see wikipedia on this.

For real analysis, load all data into RAM using existing graph analysis libraries, like SNAP or see This question.

extropy