tags:

views:

310

answers:

2

I'm looking for some C library that includes STM-style (Software Transactional Memory) hash maps, but I had no luck so far. It would be great if it was based on glib / gobject, but it's not that crucial. It also doesn't need proper transactions over many objects - single immutable hash support is all I really need.

Must haves: immutable snapshot read, lock-free write with auto-retry.

+1  A: 

Wikipedia has a list of various STM implementations.

R Samuel Klatchko
I looked through those. But they're very low-level. The best one for integration is probably stmmap: http://github.com/skaphan/stmmap. But I'm looking for something that already includes the hashmap implementation. Unfortunately I can't be sure that what some random data-structure library does is safe to just use on top of the stm library... I would prefer something working on a higher level. (ready data-structure + stm solution)
viraptor
+1  A: 

Well, I think (and there are a number of study) that current STM isn't faster than lock-free and mutex-based code. It is obvious: STM requires on-line data conflict checking. However, such conflict checking in pure software requires very huge time overheads. Currently, only Sun's ROCK processor supports a limited form of STM (Best-effort HTM with STM) by hardware. No x86 CPUs currently support TM in hardware. In short, STM is just slow.

In my opinion, you'd better just using a concurrent hash table. For example, you can find concurrent_hash_map in Intel TBB. Here is the link of TBB Manual. Oh, but it's C++, not C. But, I do believe you can (though it might take significant work) translate C++-based such hash table to C code. Intel TBB is open source.

Also, I think such highly concurrent (usually implemented as lock-free) data structures are not always useful. In some workload pattern, using such data structures is not good. To be sure, I recommend you to write a small micro-benchmark for two versions of hash tables: (1) lock-free and (2) lock-based. Also, please keep in mind the workload patterns for such micro-benchmark should be close to real one. An example could be found in here.

minjang
I know they're sometimes slower. But they're also sometimes easier in a specific solution. I have a big hash that is often read, rarely written to. Also writes take a long time and need a lookup first. With a pool of readers and 1 writer it's a perfect case for STM. Also, you DO NOT need an on-line conflict checking. You can implement many STM-style solutions in a way that reader NEVER waits and NEVER blocks (and writer never waits, although might need to retry) using only the standard CMPXCHG. For me that's one synchronisation every 30 sec -vs- grabbing the read lock every 10ms.
viraptor
I understand your problem. It's better to use "lock-free" data structure rather than STM-style. STM always means conflict checking. What about RCU? http://en.wikipedia.org/wiki/Read-copy-update
minjang
PS. I know that concurrent_hash_map is supposed to be "lock-less", but they write: "Because having access to an element can block other threads, try to shorten the lifetime of the accessor or const_accessor." Not good enough for me in this case.
viraptor
Minjang: That's what I meant (more or less ;) ). From what I understand an RCU trie-based immutable GC'd hash map pretty much meets the definition of an STM hash map (i.e. has exactly the same properties). Or am I missing something?
viraptor
STM is just software-level implementation of TM. TM is a speculative approach to handle locks efficiently. By definition, TM is a generalization of lock-free data structures. So, STM hash map actually doesn't make sense to me, and I never heard such terms. It's just wait-free/lock-free. In current x86 architecture, there is no magic that you can do better than a conventional lock-free data structure. I believe RCU-like data structure is the best for read-intensive workloads.
minjang