views:

223

answers:

4

Hello,

I am looking for an open source C implementation of a hash table that keeps all the data in one memory block, so it can be easily send over a network let say. I can only find ones that allocate small pieces of memory for every key-value pair added to it.

Thank you very much in advance for all the inputs.

EDIT: It doesn't necessarily need to be a hash table, whatever key-value pair table would probably do.

+5  A: 

The number of times you would serialize such data structure (and sending over network is serializing as well) vs the number of times you would use such data structure (in your program) is pretty low. So, most implementations focus more on the speed instead of the "maybe easier to serialize" side.

If all the data would be in one allocated memory block a lot of operations on that data structure would be a bit expensive because you would have to:

  • reallocate memory on add-operations
  • most likeley compress / vacuum on delete-operations (so that the one block you like so much is dense and has no holes)

Most network operations are buffered anyway, just iterate over the keys and send keys + values.

akira
A: 

I agree completely with akira (+1). Just one more comment on data locality. Once the table gets larger, or if the satellite data is large enough, there's most certainly cache pollution which slows down any operation on the table additionally, or in other words you can rely on the level-1/2/3 cache chain to serve the key data promptly whilst putting up with a cache miss when you have to access the satellite data (e.g. for serialisation).

hroptatyr
A: 

Libraries providing hashtables tend to hide the details and make the thing work efficiently (that is normally what programmers want when they use an hashtabe), so normally the way they handle the memory is hidden from the final programmer's eyes, and programmers shouldn't rely on the particular "memory layout", that may change in following version of the library.

Write your own function to serialize (and unserialize) the hashtable in the most convenient way for your usage. You can keep the serialized content if you need it several times (of course, when the hashtable is changed, you need to update the serialized "version" kept in memory).

ShinTakezou
Thank you very much for all your input.I used the network example just so this question isn't too specific to my project and can be useful to others.I am sending packets of data between number of processes on a single machine and I need to accompany the data with some sort of meta data, where each process just looks up or changes couple values and sends it to the next process.WOuldn't it be inefficient to serialize and "unserialize" all of the meta data if each process only wants to deal with couple of them? Maybe hash table is not at all what I want to use in this case? Any suggestions?
p3k
A: 

On a unix system I'd probably utilise a shared memory buffer (see shm_open()), or if that's not available a memory-mapped file with the MAP_SHARED flag, see the OS-specific differences though http://en.wikipedia.org/wiki/Mmap

If both shm_open and mmap aren't available you could still use a file on the disk (to some extent), you'd have to care about the proper locking, I'd send an unlock signal to the next process and maybe the seek of the updated portion of the file, then that process locks the file again, seeks to the interesting part and proceeds as usual (updates/deletes/etc.).

In any case, you could freely design the layout of the hashtable or whatever you want, like having fixed width key/seek pairs. That way you'd have the fast access to the keys of your hashtable and if necessary you seek to the data portion, then copy/delete/modify/etc.

Ideally this file should be on a ram disk, of course.

hroptatyr
Thank you for your input hroptatyr. However in my question, I am not asking about how to share data between processes, I have a technique for doing that (in fact I am using shared memory available on Linux that you mentioned).What I am looking for is a library which I can give a nice block of memory to work with and I can put key-value pairs in for as long as there is enough space in the data block. Once the data in, I can go and look up the values by their keys.No dynamic memory allocations.
p3k
I once wrote a thing like that, it even supported a clever cuckoo hashing scheme where the keys got swapped but the satellite data didn't. I did write it with serialisation in mind just like you but I found that it didn't perform at all compared to a separated key block/satellite data block approach due to cache pollution. It was part of a distributed hashing setup and my primary objective was lookup speed, I did about 1 (de)serialisation per 20M lookups.
hroptatyr
Oh and to actually contribute ideas: I now use xdr which is the serialisation backend of rpcgen. The data remain in their structs and rpcgen generates the (de)serialiser functions. And seeing as array serialisation is possible it could meet your requirements, only that it's not natively a hash table.
hroptatyr