views:

410

answers:

3

Hi,

I'm using adjacency_list< vecS, vecS, bidirectionalS ... > extensively. I have so many graphs loaded at once that memory becomes an issue. I'm doing static program analysis and store the callgraph and flowgraphs of the disassembled binary in boost graphs. Thus I can have several ten thousand functions==flowgraphs and one gigantic callgraph. I'd really like to reduce memory usage for my graphs while still using the BGL.

Since my graphs are static after loading and edge and vertex counts are known beforehand I see huge potential for optimization. For example, I'd like to allocate a single buffer for all vertices/edges of a single graph and let the graph just store indices into that buffer.

more questions:
1) what's the memory overhead of using vertex and edge properties? I have quite a few of them.
2) is it possible to convince the BGL to use the shrink to fit idiom? As I understand it the adjacency lists use push_back to add edges. Is it possible to reduce memory usage by swapping the resulting vector with a copy of itself? Maybe by copying the whole graph?
3) Is it possible to use boost pool allocators with the BGL? As far as I can tell the BGL currently performs lots of small allocations - I'd really like to avoid that for space and runtime efficiency reasons.

Did anyone already build a BGL version optimized for memory usage? Should I try using the existing graph structures and augment it with custom allocators or somesuch or is it more fruitful to write my own implementation and try to stay interface compatible with the BGL so I may continue to use it's algorithms?

best regards,

Sören
A: 

Since BGL is designed to interoperate with legacy or custom graphs, you're probably best off writing your own graph.

me22
+1  A: 
  1. The overhead depends which version you're using and whether you've gone for "bundled" properties or not. I've only used bundled properties and reading the code I'd expect each property bundle costs you 2 pointers + sizeof the bundle type you're using + sizeof each of the attached properties. None of the concept checking stuff is left in the binary afaik. Since you have the code though, why not just measure the cost? If you don't have any tooling to help try just generating graphs of known sizes in buffers of known sizes. Something will fail eventually and at that point you should have counts.

  2. Have you tried calling adjacency_list<blah>.swap(adjacency_list& x)? I'd hope that would shrink the containers appropriately.

  3. ???

I started writing something like the system you're describing, but eventually gave up on the BGL and switched to coding my own algorithms to run on an sqlite database of all the linker symbols.

Mike Burrows
Yeah, I have have tried shrink-to-fit (suggestion No.2) but it didn't help much. We are also using database backends and currently support mySQL, postgreSQL and SQLite. However, our access patterns for these particular algorithms really require an in memory, specialized data structure.
BuschnicK
+4  A: 

There is a little known graph type called "compressed sparse row" graph in the BGL. It seems to be quite new and is not linked from the index pages. It does however employ a beautiful little trick to get the graph representation as small as possible. http://www.boost.org/doc/libs/1%5F40%5F0/libs/graph/doc/compressed%5Fsparse%5Frow.html

Using this for some of our graphs I have already been able to reduce total memory usage by 20% - so it is a very worthwhile optimization indeed.

It also stores the vertex/edge properties in vectors thus giving the smallest possible overhead for those as well.

Note that the version shipping with the latest boost 1.40 supports directional graphs only (as opposed to bi-directional). If you need to be able to efficiently iterate over a vertice's out-edges and in-edges (as I did) you need to check out the boost trunk from subversion. Jeremiah was very helpful in adding that feature on my request.

BuschnicK