views:

287

answers:

2

I currently have a graph that has about 10 million nodes and 35 million edges. For now the complete graph is loaded into memory at program start. This takes a couple of minutes (it is Java after all) and needs about half a gigabyte of RAM. For now it runs on a machine with a dual core processor and 4 gigabytes of RAM.

When the graph is searched using a breadth-first-search the memory usage rises to a peak of one gigabyte and it takes ten seconds on average.

I would like to deploy the program on a couple of computers. The functionality apart from the graph search does take very little resources. My target system is very miniature and has only 512 megabytes of RAM.

Any suggestions on how to implement a method (probably using a database) to search that graph without consuming too much memory? The program is idle most of the time as it is accessing a hardware device, so the path-finding could take about 5 minutes max for the mentioned graph...

Thanks for any thoughts thrown in my direction.

UPDATE:

Just found neo4j. Anybody knows if it would be suitable for this kind of humongous graph?

+5  A: 

Your question is a little vague, but in general, a good strategy that mostly follows breadth first semantics while using the same amount of memory as depth-first search is Iterative Deepening. The idea is that you do a depth-first search limited to 1 level at first; if that fails to find a solution, start from scratch and limit it to 2 levels; if that fails, try 3 levels, and so on.

This may seem a bit redundant at first, but since you're doing a depth-first search, you keep much fewer nodes in memory, and always search one less level than a straightforward breadth-first search. Since the amount of nodes in a level grows exponentially, on larger graphs, it's very likely that saving that one last extra level pays off for trying all preceding layers redundantly.

Max Shawabkeh
I haven't known/thought about Iterative Deepening so far. It sounds like a first step to solve at least the memory usage of the actual search.Any rough estimate how a graph search would perform when keeping the actual data off-ram in a file or a database?
allesblinkt
Depending on how your graph is represented. Perhaps you can load it level-by-level until you hit a solution? Doing a straight read off the disk for each node is most likely going to wreak havoc with performance.
Max Shawabkeh
Interesting. I should try writing an indexed file optimized for that. For now the file is only an ASCII list containing lots of REFERENCE->REFERENCED
allesblinkt
If you can arrange the data properly, you could also load subsequent levels to search asynchronously with the search process. That is, while searching level 4, be reading level 5 from disk. This does require that your data file be level-ordered from whatever root node, though.
Novelocrat
@Novelocrat: I have different start (root) nodes and end nodes every time. So it is kind of difficult to store it that way.
allesblinkt
+1  A: 

I would say that Neo4j is definitely a good way to go when you have a decent sized graph such as this. Not only does it have built-in BFS algorithms you will also persist you data on disk, thus reducing your start-up time.

Check this out on highscalability.com: NEO4J - A GRAPH DATABASE THAT KICKS BUTTOX

I've used Neo4j and their documentation is very good, and they provide some nice getting started examples, which really do take just a few minutes to get going.

Check out their - Getting started in 10 minutes guide

Binary Nerd