You don't seem to be getting a lot of answers wrt C, C++, Java or Python, so I'll jump in.
We have a parallel programming language, PARLANSE. PARLANSE is intended to handle irregular but parallel computations.... like graph processing. PARLANSE is designed for SMP, not distributed, processing. We mostly use for program analyses, where the graph structures are syntax trees, symbol tables, control and data flow graphs, etc.
As one way to measure performance, we run a graphsearch program, that just visits all the nodes in a connected graph, in parallel, marking each as visited. Here's runs with 1-8 processors on an Intel I7:
C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run -p1 graphsearch
Parallel GraphSearch of size 1000000 with neighbor density: 5.308947...Runtime: 23.919828 seconds
C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run -p2 graphsearch
Parallel GraphSearch of size 1000000 with neighbor density: 5.308947...Runtime: 12.680212 seconds
C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run -p3 graphsearch
Parallel GraphSearch of size 1000000 with neighbor density: 5.308947...Runtime: 9.032862 seconds
C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run -p4 graphsearch
Parallel GraphSearch of size 1000000 with neighbor density: 5.308947...Runtime: 7.935573 seconds
C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run -p5 graphsearch
Parallel GraphSearch of size 1000000 with neighbor density: 5.308947...Runtime: 6.079331 seconds
C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run -p6 graphsearch
Parallel GraphSearch of size 1000000 with neighbor density: 5.308947...Runtime: 5.360442 seconds
C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run -p7 graphsearch
Parallel GraphSearch of size 1000000 with neighbor density: 5.308947...Runtime: 4.952811 seconds
C:\DMS\Domains\PARLANSE\Tools\PerformanceTest>run -p8 graphsearch
Parallel GraphSearch of size 1000000 with neighbor density: 5.308947...Runtime: 4.626209 seconds
The speedup isn't linear but it isn't too bad either. Note that the i7 doesn't have 8 equal power processors; rather it has 4 hyperthreaded CPUs.
It is worth noting that the times provided included each node visit including a loop of 10,000 iterations to simulate work at that node. If the actual work is only a 100 machine instructions, the total execution time should be considerably reduced. (One of these days when I remember I'll go run that experiment and post the result here).
The PARLANSE code follows at the end.
The parallelism is caused by the pure-parallel (|| which runs all sub computations in parallel and the partial order operators (|; that run children according to a time sequence defined by the embedded partial order formed by naming the subcomputations and providing ordering constraiints with "before" operators (>>. This program is only intended as a performance test and so it doesn't do anything useful, and the performance is limited somewhat by the partial order operators.
PARLANSE presently works in 32 bit images. The million-nodes this program processes uses about 2Gb of the 4Gb available so this may be too small for what you have in mind. We are in early planning stages of a 64 bit implementation.
`GraphSearch.par -- Tests performance of parallel graph search
Copyright (C) 2008-2010 Semantic Designs; All Rights Reserved.
Generates a random graph from a seed so it is repeatable.
Explores graph from fixed starting place, marking each
node as visited. Uses parallelism, partial orders,
and teams to manage neighbor visits depending on node
outdegree.
Note: Graphsize when large may cause paging.
Compiling with -d aggravates this due to large activation record sizes'
(define Parallel ~t)
(define Seed 0)
(define GraphSize 1000000)
(define GraphDensity 5.5) ; must be float
(define NodeSimulatedWork 10000)
(includeunique `Timer.par')
(includeunique `Random.par')
[random_state Random:GeneratorState]
(define randomreset (action (procedure natural)
(;; (= random_state (Random:create 2 1 1 ?))
);;
)action
)define
(define random (lambda (function float void)
(Random:random random_state)
)lambda
)define
(define GraphNode
(structure
[visited boolean]
[owned semaphore]
[neighbors (array natural 1 dynamic)]
)structure
)define
[graph (array GraphNode 1 dynamic)]
(define VisitGraph
(action (procedure [node natural])
(;;
(ifthenelse graph:node:visited
(return) ; nothing to do, fast path
(;; `Visit the node'
(consume (lock graph:node:owned))
(ifthenelse graph:node:visited
(;; `Ooops, timing splinter, somebody else got here first'
(consume (unlock graph:node:owned))
(return)
);;
(;;
(compileifthen ~f
(;; (Console:Put (. `Processing '))
(Console:PutNatural node)
(Console:PutNewline)
);;
)compileifthen
(= graph:node:visited ~t) ; mark as visited
(consume (unlock graph:node:owned))
);;
)ifthenelse
(do [i natural] 1 NodeSimulatedWork 1 (;; `Simulate work on node' );; )do
(compileifthen (~ Parallel) ; parallelism is unsafe?
(;; (do [i integer] +1 (upperbound graph:node:neighbors 1) +1
(VisitGraph graph:node:neighbors:i) ; sequential ick
)do
(return)
);;
)compileifthen
(case (coerce natural (upperbound graph:node:neighbors 1))
1 (VisitGraph graph:node:neighbors:1)
2 (|| (VisitGraph graph:node:neighbors:1)
(VisitGraph graph:node:neighbors:2)
)||
3 (;| a (VisitGraph graph:node:neighbors:1)
b (VisitGraph graph:node:neighbors:2)
c (>> a b) (VisitGraph graph:node:neighbors:3)
);|
4 (;| a (VisitGraph graph:node:neighbors:1)
b (VisitGraph graph:node:neighbors:2)
c (>> a b) (VisitGraph graph:node:neighbors:3)
d (>> a) (VisitGraph graph:node:neighbors:4)
);|
5 (;| a (VisitGraph graph:node:neighbors:1)
b (VisitGraph graph:node:neighbors:2)
c (>> a b) (VisitGraph graph:node:neighbors:3)
d (>> a) (VisitGraph graph:node:neighbors:4)
e (>> b d) (VisitGraph graph:node:neighbors:5)
);|
else
(compileifthenelse ~f ; draft doesnt work?
(do [i integer] +1 (upperbound graph:node:neighbors 1) +1
(VisitGraph graph:node:neighbors:i) ; sequential ick
)do
(local [workers team]
(;; (do [i integer] +1 (upperbound graph:node:neighbors 1) +1
(consume (draft workers VisitGraph graph:node:neighbors:i))
)do
(consume (wait (@ (event workers)))) ; wait for terminated
);;
)local
)compileifthenelse
)case
);;
)ifthenelse
);;
)action
)define
(define main
(action (procedure void)
(local (|| [timer Timer:Timer]
)||
(;; (compileifthenelse Parallel
(Console:Put (. `Parallel GraphSearch of size '))
(Console:Put (. `Sequential GraphSearch of size '))
)compileifthenelse
(Console:PutNatural GraphSize)
(Console:PutSpace)
(;; `Construct a random graph'
(randomreset Seed)
(resize graph 1 GraphSize)
(do [victim natural] 1 GraphSize 1
(;; (= graph:victim:visited ~f) ; mark "unvisited"
(consume (addresource graph:victim:owned 1))
(ifthen (~= victim GraphSize)
(;; `Ensure there is a spanning tree for the graph'
(resize graph:victim:neighbors 1 1)
(= graph:victim:neighbors:1 (++ victim))
);;
)ifthen
);;
)do
(compileifthen ~f
(do [victim natural] 1 GraphSize 1
(;; (Console:PutNatural victim)
(Console:PutInteger (upperbound graph:victim:neighbors 1))
(Console:PutNewline)
);;
)do
)compileifthen
(do [i natural] 1 (coerce natural (* (+ (* (random) .2) .9) (coerce float GraphSize) (- GraphDensity 1.0))) 1
(local (;; [victim natural]
[new_neighbor_count natural]
[new_neighbor natural]
);;
(;;
; (Console:PutNatural i)(Console:PutCharacter ":")
(= victim (++ (coerce natural (floor (* (random) (coerce float GraphSize))))))
; (Console:PutNatural victim)(Console:PutSpace)
(= new_neighbor_count (++ (coerce natural (upperbound graph:victim:neighbors 1))))
; (Console:PutNatural new_neighbor_count)(Console:PutSpace)
(= new_neighbor (coerce natural (++ (floor (* (random) (coerce float GraphSize))))))
; (Console:PutNatural new_neighbor)(Console:PutNewline)
(resize graph:victim:neighbors 1 new_neighbor_count)
(= graph:victim:neighbors:new_neighbor_count new_neighbor)
);;
)local
)do
);;
(local (|| [neighbor_count natural]
(= [total_neighbors natural] 0)
)||
(;; (do [victim natural] 1 GraphSize 1
(;; (= neighbor_count (coerce natural (upperbound graph:victim:neighbors 1)))
(+= total_neighbors neighbor_count)
(compileifthen ~f
(;;
(Console:PutNatural victim)
(Console:PutCharacter ":")
(Console:PutNatural neighbor_count)
(Console:PutNewline)
);;
)compileifthen
);;
)do
(Console:Put (. `with neighbor density: '))
(Console:PutFloatDecimalFormat (/ (coerce float total_neighbors) (coerce float GraphSize)))
);;
)local
(Console:Put (. `...'))
(Timer:Reset (. timer))
(VisitGraph 1)
(Timer:Stop (. timer))
(Timer:PrintElapsedTime (. timer) (. `Runtime: '))
(Console:PutNewline)
);;
)local
)action
)define