views:

236

answers:

3

Basically I am looking for a graph library that would have fine-grained locking around graph operations, so that different threads touching different parts of the graph could alter it simultaneously, and competing modifications could be blocked.

I googled around a bit and can't find anything. Maybe this is too specific to my needs, but I would imagine that there would exist a good number of scientific applications that would work on large graphs.

+2  A: 

Assuming you are asking about graphs as a mathematical entity (for example, the kind of graphs processed by GraphBase) - then I don't think you'll find what you are looking for, at least in a general purpose library.

The issue is that "thread safe" isn't really a well-defined term in all contexts - at a minimum it probably means that your program probably won't explode in the face of concurrent processing from multiple threads - but beyond that requirement, even something like a container or algorithm which purports to be "tread-safe" is going to have some additional semantics regarding exactly what that means - and a general purpose graph library is even more complex and would require even more precise (and configurable) semantics to satisfy this.

For example (speaking of a simple object here - say a container):

Is concurrent read access allowed to the same object? Is concurrent write access allowed to the same object? Do readers block readers? Do readers block writers? Do writers block writers? More generally, what operations block other operations?

Although you can define reasonable semantics for simple containers - like a concurrent set - even once you move to something like a map you need to introduce compound operations like "putIfAbsent" to make up for the fact that many applications need more than thread-safety - they need to link together operations that were typically performed in separate calls (search, then add) into a logically atomic operation.

I suspect that with something like a graph library, this problem becomes acute. Your application will need to define which operations can proceed in parallel - which operations should block others, whether you need to be able to take a consistent snapshot of the graph, whether you need full serializability of operations on the graph, or whether a relaxed model is appropriate, and so it.

I would be prohibitively difficult to build all this generality into a graph library, and perhaps impossible to predict ahead of time what the actual requirements were with respect to thread safety (the above is only a small fraction of the possible considerations), so I suspect that most graph libraries will provide only weak guarantees about behavior (e.g., allowing concurrent read access) or no explicit safety at all, and expect the consumer of the library to build higher level synchronization constructs on top.

Another reason may high performance libraries are found only in thread-unsafe versions is to avoid penalizing clients who don't require this safety (which may be most of them, for most instances of a given class) - with the assumption that thread safety can be added externally by those who require it, but that the reverse transformation (removing thread safety) is not generally possible. Java has been taking this path, with the effective deprecation of some of the early thread-safe classes such as Vector in favor of ArrayList and StringBuffer vs StringBuilder.

On the other hand, the assumption that clients can add thread-safety to containers externally (no access to internals) in an efficient way is often incorrect - which is the basis behind the built-from-the-group-up thread-safe containers in the concurrent package. I doubt very much you'll find an equivalent for graph manipulation in C++, however.

Update: In light of other replies, I guess I should emphasize that I don't think you'll find general thread-safe extensions of single threaded libraries - but apparently there are several explicitly parallel thread libraries, although I suspect these implement higher level semantics and do the parallelism internally, rather than being geared to thread-safe individual graph manipulation operations.

BeeOnRope
I see what you're saying. Perhaps it would be easier to explicitly partition the graph into parts, and have each thread word on a single part. Then the only locking needed would be when an operation concerns an edge or vertex on the boundary between two parts.My concern with this idea is that certain parts of the graph may turn out to be more interesting than others, which would leave some threads idling.
drpepper
The problem you mention is common in parallel algorithms where you partition the working set, and where different partitions of the set may take wildly different processing times, and determining this time isn't feasible up front. I'll describe two (similar) strategies below.
BeeOnRope
One strategy is to partition the working set into more parts than you have workers - so if you decide on 4 workers, create 20 parts, which increases the chance that the works do similar amounts of work (since one worker could do up to 17 "uninteresting parts"). It can be hard to determine what the optimal number of partitions is, however. There may also be practical limits of the number of partitions you can create.
BeeOnRope
A second is to use work stealing - if some threads get their work done quickly, they can "steal" work from the remaining threads. This is more dynamic in that you don't have to decide up-front the ideal partition count, but requires more interaction between threads, and some model where stealing work is actually possible.
BeeOnRope
+2  A: 

You might want to take a look at the Parallel Boost Graph Library (and maybe the MultiThreaded Graph Library). I haven't used either myself yet, just stumbled across them while looking into parallelism as an option for speeding up some BGL code. (Ah... it looks like Parallel-BGL is now officially in boost! There's a nice architectural overview, but maybe their notion of a "distributed graph" is quite coarse-grained compared with what you had in mind).

timday
Thanks for the links. The Parallel BGL looks really good, but seems to be more for parallel computation across multiple processors without shared memory. In my case, I think this would be overkill.As for the MTGL, it looks pretty unfinished. They keep mentioning that they have to stabilize their "basic API"! But I'll keep an eye on it.
drpepper
+1  A: 

Unfortunately, the cost of fine-grained locking is likely higher than the speedup from multiple threads. Not to mention the risk of deadlock, lock management becomes MUCH harder when you have more than a very small number of mutexes.

Ben Voigt