Are there any thread-safe graph libraries for C++?

Assuming you are asking about graphs as a mathematical entity (for example, the kind of graphs processed by GraphBase) - then I don't think you'll find what you are looking for, at least in a general purpose library.

The issue is that "thread safe" isn't really a well-defined term in all contexts - at a minimum it probably means that your program probably won't explode in the face of concurrent processing from multiple threads - but beyond that requirement, even something like a container or algorithm which purports to be "tread-safe" is going to have some additional semantics regarding exactly what that means - and a general purpose graph library is even more complex and would require even more precise (and configurable) semantics to satisfy this.

For example (speaking of a simple object here - say a container):

Is concurrent read access allowed to the same object? Is concurrent write access allowed to the same object? Do readers block readers? Do readers block writers? Do writers block writers? More generally, what operations block other operations?

Although you can define reasonable semantics for simple containers - like a concurrent set - even once you move to something like a map you need to introduce compound operations like "putIfAbsent" to make up for the fact that many applications need more than thread-safety - they need to link together operations that were typically performed in separate calls (search, then add) into a logically atomic operation.

I suspect that with something like a graph library, this problem becomes acute. Your application will need to define which operations can proceed in parallel - which operations should block others, whether you need to be able to take a consistent snapshot of the graph, whether you need full serializability of operations on the graph, or whether a relaxed model is appropriate, and so it.

I would be prohibitively difficult to build all this generality into a graph library, and perhaps impossible to predict ahead of time what the actual requirements were with respect to thread safety (the above is only a small fraction of the possible considerations), so I suspect that most graph libraries will provide only weak guarantees about behavior (e.g., allowing concurrent read access) or no explicit safety at all, and expect the consumer of the library to build higher level synchronization constructs on top.

Another reason may high performance libraries are found only in thread-unsafe versions is to avoid penalizing clients who don't require this safety (which may be most of them, for most instances of a given class) - with the assumption that thread safety can be added externally by those who require it, but that the reverse transformation (removing thread safety) is not generally possible. Java has been taking this path, with the effective deprecation of some of the early thread-safe classes such as Vector in favor of ArrayList and StringBuffer vs StringBuilder.

On the other hand, the assumption that clients can add thread-safety to containers externally (no access to internals) in an efficient way is often incorrect - which is the basis behind the built-from-the-group-up thread-safe containers in the concurrent package. I doubt very much you'll find an equivalent for graph manipulation in C++, however.

Update: In light of other replies, I guess I should emphasize that I don't think you'll find general thread-safe extensions of single threaded libraries - but apparently there are several explicitly parallel thread libraries, although I suspect these implement higher level semantics and do the parallelism internally, rather than being geared to thread-safe individual graph manipulation operations.

I see what you're saying. Perhaps it would be easier to explicitly partition the graph into parts, and have each thread word on a single part. Then the only locking needed would be when an operation concerns an edge or vertex on the boundary between two parts.My concern with this idea is that certain parts of the graph may turn out to be more interesting than others, which would leave some threads idling.

drpepper 2009-12-22 13:04:59

The problem you mention is common in parallel algorithms where you partition the working set, and where different partitions of the set may take wildly different processing times, and determining this time isn't feasible up front. I'll describe two (similar) strategies below.

BeeOnRope 2009-12-23 00:21:59

One strategy is to partition the working set into more parts than you have workers - so if you decide on 4 workers, create 20 parts, which increases the chance that the works do similar amounts of work (since one worker could do up to 17 "uninteresting parts"). It can be hard to determine what the optimal number of partitions is, however. There may also be practical limits of the number of partitions you can create.

BeeOnRope 2009-12-23 00:22:44

A second is to use work stealing - if some threads get their work done quickly, they can "steal" work from the remaining threads. This is more dynamic in that you don't have to decide up-front the ideal partition count, but requires more interaction between threads, and some model where stealing work is actually possible.

BeeOnRope 2009-12-23 00:24:09

Thanks for the links. The Parallel BGL looks really good, but seems to be more for parallel computation across multiple processors without shared memory. In my case, I think this would be overkill.As for the MTGL, it looks pretty unfinished. They keep mentioning that they have to stabilize their "basic API"! But I'll keep an eye on it.

drpepper 2009-12-22 13:00:30

ansaurus

tags:

views:

answers:

Are there any thread-safe graph libraries for C++?

related questions