I am looking for a content addressing data structure

views:

165

answers:

+3 Q:

I am looking for a content addressing data structure

I'm trying to design a data structure that allows efficient extraction of entries from a part of their content.

Let's say I am looking for an entry that matches this: [ x 2 3 x x ]

If [ 0 2 3 4 5 ] or [ 3 2 3 7 8 ] are in my data structure, they should be returned by my find function.

I wrote such a data structure where I compare the "pattern" against all entries of the data structure, but of course it takes way too much time. I have few ideas about how to do this a faster way but they are quite heavy to implement. Does something like this already exists? If not, how would you do this?

+2 A:

What you're trying to implement looks a lot like tuple space.

vartec 2009-03-18 08:53:58

Yes it is actually what I want to do but I need I wanted a hint for implementing it. I guess I'll to look at the code! Thanks

Ben 2009-03-18 09:08:59

Let me guess. Distributed programming class homework? ;-)

vartec 2009-03-18 09:57:22

Nop ! that is real work :/

Ben 2009-03-18 10:03:50

Oh, then why not use already available solutions?

vartec 2009-03-18 10:14:56

Well I have specific needs but I simplified a bit the problem to ask my question. Available solutions are not exactly what I want, but they are a good start for implementation, I am on it.

Ben 2009-03-18 11:59:04

+4 A:

Well, a Suffix tree could do it in O(1), but it would take a LOT of memory.

Vilx- 2009-03-18 09:03:36

+1. Assuming you have multiple entries, what you really want is called a generalised suffix tree, which is formed by concatenating all the entries together, using distinct (unique) separator symbols between each. Suffix arrays are a (theoretically) slower but much more memory-efficient alternative.

j_random_hacker 2009-03-18 15:15:43

P.S.: Suffix trees and arrays rock, but they are *hard* to implement. If you can't find an existing implementation, give yourself a solid 2 weeks to try to understand Ukkonnen's or McCreight's original papers on linear-time construction algorithms.

j_random_hacker 2009-03-18 15:19:00

How specifically does it do it in O(1)? You can retrieve all the entries containing a particular substring in O(length(substring)), if that's what you mean. I think the given problem does not restrict the known bits to be a single contiguous substring, either.

Darius Bacon 2009-03-22 02:39:09

I think it does restrict it to a single contiguous substring, but yes, it's not very clear on this. As for O(1) - well, it depends on how you look at it. The Big O in this case depends only on the length of the substring, not the amount of entries in the dictionary.

Vilx- 2009-03-23 08:02:48

Clasically when one talks about searching/sorting algorithms he measures the Big O in respect to the number of entries, not the length of them. From that point of view it's O(1).

Vilx- 2009-03-23 08:03:58

+5 A:

The first thing that comes to mind is to have a hash table for each position in the tuple. For searching you intersect the results for all the positions with a specified value.

starblue 2009-03-18 09:05:57

To hash each item is a starting point, but for large answer sets the intersection takes time and consumes memory. Consider also to hash each subset of a specific window size.

dmeister 2009-03-18 14:00:56

You might take a look at the RETE algorithm. This general problem came up in the AI systems of the 70s; Chapter 14 in Paradigms of AI Programming covers it. (IIRC there's mainly elaborations of starblue's answer and decision trees.)

Darius Bacon 2009-03-22 02:46:07

ansaurus

tags:

views:

answers:

I am looking for a content addressing data structure

related questions