tags:

views:

370

answers:

4

What is a good way to represent sparse set of integers (really C memory addresses) in a compact and fast way. I already know about the obvious things like bit-vectors and run-length encoding. but I want something much more compact than one word per set element. I need to add and remove elements and test for membership. I do not need other set operations, like union.

I read about one such library many years ago but have since forgotten its name. I think it was released as open source by HP and had a womans name.

+2  A: 

If all you need is insertion, deletion, and test for membership, then a hash table should suit you nicely. You can find some good hash functions for hashing 32-bit integers here.

Adam Rosenfield
That is not compact enough -1
Stephan Eggermont
A: 

If you want the structure smaller than the data set than you should probably look at some kind of tree arrangement. Make each level of a 4 way the tree key off 2 bits starting at the high end and it might compact quite well (if the pointers have any degree of spacial locality). The trick would be encoding it compactly enough (indexes into arrays of nodes? an array mapped tree?).

BCS
+2  A: 

A very compact data structure would be a bloom filter, perhaps a counting bloom filter to support deletions.

http://en.wikipedia.org/wiki/Bloom_filter

The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Elements can be added to the set, but not removed (though this can be addressed with a counting filter)

jakber
Thanks. I know about these but I cannot accept false positives (false negatives could be acceptable but less than ideal).
+6  A: 

You are referring to a judy array. It was a HP project. I think they are used in ruby and are available in c. Very interesting data structure. Making use of the fact that allocations are (at least) word aligned, having separate structures for dense and sparse ranges.

http://judy.sourceforge.net/index.html

Stephan Eggermont
Thank you. "Judy" was indeed the one I was thinking about. I would never have re-remembered that name.