Judy array for managed languages

views:

1191

answers:

+4 Q:

Judy array for managed languages

Judy array is fast data structure that may represent a sparse array or a set of values. Is there its implementation for managed languages such as C#? Thanks

This is proving trickier than I thought. PyJudy might be worth a look, as would be Tie::Judy. There's something on Softpedia, and something Ruby-ish. Trouble is, none of these are .NET specifically.

boost 2009-02-15 07:23:01

+6 A:

It's worth noting that these are often called Judy Trees or Judy Tries if you are googling for them.

I also looked for a .Net implementation but found nothing. Also worth noting that:

The implementation is heavily designed around efficient cache usage, as such implementation specifics may be highly dependent on the size of certain constructs used within the sub structures. A .Net managed implementation may be somewhat different in this regard.

There are some significant hurdles to it that I can see (and there are probably more that my brief scan missed)

The API has some fairly anti OO aspects (for example a null pointer is viewed as an empty tree) so simplistic, move the state pointer to the LHS and make functions instance methods conversion to C++ wouldn't work.
The implementation of the sub structures I looked at made heavy use of pointers. I cannot see these efficiently being translated to references in managed languages.
The implementation is a distillation of a lot of very complex ideas that belies the simplicity of the public api.
The code base is about 20K lines (most of it complex), this doesn't strike me as an easy port.

You could take the library and wrap the C code in C++/CLI (probably simply holding internally a pointer that is the c api trie and having all the c calls point to this one). This would provide a simplistic implementation but the linked libraries for the native implementation may be problematic (as might memory allocation). You would also probably need to deal with converting .Net strings to plain old byte* on the transition as well (or just work with bytes directly)

ShuggyCoUk 2009-02-16 15:12:05

+5 A:

Judy really doesn't fit well with managed languages. I don't think you'll be able to use something like SWIG and get the first layer done automatically.

I wrote PyJudy and I ended up having to make some non-trivial API changes to fit well in Python. For example, I wrote in the documentation:

JudyL arrays map machine words to machine words. In practice the words store unsigned integers or pointers. PyJudy supports all four mappings as distinct classes.

pyjudy.JudyLIntInt - map unsigned integer keys to unsigned integer values
pyjudy.JudyLIntObj - map unsigned integer keys to Python object values
pyjudy.JudyLObjInt - map Python object keys to unsigned integer values
pyjudy.JudyLObjObj - map Python object keys to Python object values

I haven't looked at the code for a few years so my memories about it are pretty hazy. It was my first Python extension library, and I remember I hacked together a sort of template system for code generation. Nowadays I would use something like genshi.

I can't point to alternatives to Judy - that's one reason why I'm searching Stackoverflow.

Edit: I've been told that my timing numbers in the documentation are off from what Judy's documentation suggests because Judy is developed for 64-bit cache lines and my PowerBook was only 32 bits.

Some other links:

Patricia tries (http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA/ )
Double-Array tries (http://linux.thai.net/~thep/datrie/datrie.html)
HAT-trie (http://members.optusnet.com.au/~askitisn/index.html)

The last has comparison numbers for different high-performance trie implementations.

Andrew Dalke 2009-02-18 18:37:11

ansaurus

tags:

views:

answers:

Judy array for managed languages

related questions