views:

129

answers:

9

I need to store data in memory where I map one or more key strings to an object, as follows:

"green", "blue" -> object1
"red", "yellow" -> object2

So, in Java the datastructure might implement:

Map<Set<String>, V>

I need to be able to efficiently receive a list of objects, where the strings match some boolean criteria, such as:

("red" OR "green") AND NOT "blue"

I'm working in Java, so the ideal solution would be an off-the-shelf Java library. I am, however, willing to implement something from scratch if necessary.

Anyone have any ideas? I'd rather avoid the overhead of an in-memory database if possible, I'm hoping for something comparable in speed to a HashMap (or at least the same order of magnitude).

A: 

I would say that the easiest way is simply to do a recursive filtering and being cleaver, when for instance evaluating X AND Y where X has been evaluated to the empty set.

The mapping however, needs to be from tags (such as "red" or "blue") to sets of objects.

The base case (resolving the atomic tags) of the recursion, would then be a simple lookup in this map. AND would be implemented using intersection, OR using union, and so on.

aioobe
+1  A: 

Would the criteria be amenable to bitmap indexing: http://en.wikipedia.org/wiki/Bitmap_index ?

pdbartlett
A: 

Check out the Apache Commons - Collections project. They have tons of great stuff that you will be able to use, particularly the CollectionUtils class for performing strong collection-based logic.

For instance, if your values were stored in a HashMap (as suggested by another answer) as follows:

myMap["green"] -> obj1
myMap["blue"] -> obj1
myMap["red"] -> obj2
myMap["yellow"] -> obj2

Then to retrieve results that match: ("red" or "green") and not "blue you might do this:

CollectionUtils.disjunction(CollectionUtils.union(myMap.get("red"), myMap.get("green")), myMap.get("blue"))

Tim Drisdelle
I don't think that would be sufficient, as it would allow only one green object, one blue object etc.
seanizer
Just a quick (non-functional!) example of some tools and utilities in CollectionUtils. I'm playing with it now and will reply soon - interesting problem!
Tim Drisdelle
+4  A: 

Actually, I liked the problem so I implemented a full solution in the spirit of my earlier answer:

http://pastebin.com/6iazSKG9

A simple solution, not thread safe or anything, but fun and a good starting point, I guess.

Edit: Some elaboration, as requested


See the unit test for usage.

There are two interfaces, DataStructure<K,V> and Query<V>. DataStructure behaves somewhat like a map (and in my implementation it actually works with an internal map), but it also provides reuseable and immutable query objects which can be combined like this:

    Query<String> combinedQuery = 
    structure.and(
                    structure.or(
                            structure.search("blue"), 
                            structure.search("red")
                    ),
                    structure.not(
                            structure.search("green")
                    )
    );

(A Query that searches for objects that are tagged as (blue OR red) AND NOT green). This query is reuseable, which means that it's results will change whenever the backing map is changed (kind of like an ITunes smart playlist).

The query objects are already thread safe, but the backing map is not, so there is some room for improvement here. Also, the queries could cache their results, but that would probably mean that the interface would have to be extended to provide for a purge method (kind of like the detach method in Wicket's models), which wouldn't be pretty.

As for licensing: if anybody wants this code I'll be happy to put it on SourceForge etc. ...

Sean

seanizer
Interesting, I'm not seeing an earlier answer though - can you expand a little on the basic idea behind how this works?Would you consider putting your code under an open source license (eg. LGPL)?
sanity
I deleted my earlier answer, sorry. I'll elaborate in my answer
seanizer
+1 for providing a complete and genius solution
mhaller
Cool. A few thoughts: It seems to create quite a few intermediate ArrayLists - I wonder if it could be done without these (perhaps returning a lazy Iterable rather than a Collection). Caching intermediate results would be very interesting. I'm also wondering about heuristics to determine, for And and Or, which way around the arguments should be handled for maximum efficiency.
sanity
what you're looking at is just a proof of concept, 30 minutes' worth of coding. But I have created a project at google code (http://code.google.com/p/imds4j/) and I'd be happy if you (or anybody else) contributed to it, and I'll also be happy to elaborate on your comment there.
seanizer
A: 

You could map string keys to a binary constant, then use bit shifting to produce an appropriate mask.

DeadMG
A: 

i truly think some type of database solution is your best bet. SQL easily supports querying data by

(X and Y) and not Z
ooo
of course, but that's a lot less fun :-)
seanizer
A: 

The Google Collections SetMultimap looks like an easy way to get the underlying structure, then combining that with the Maps static filters to get the querying behavior you want.

Construction would go something like

smmInstance.put(from1,to1);
smmInstance.put(from1,to2);
smmInstance.put(from2,to3);
smmInstance.put(from3,to1);
smmInstance.put(from1,to3);
//...

queries would then look like

valueFilter = //...build predicate
Set<FromType> result = Maps.filterValues(smmInstance.asMap(),valueFilter).keySet()

You can do any amount of fancy building the predicate, but Predicates has a few methods that would probably be enough to do contains/doesn't contain style queries.

Carl
A: 

I wasn't able to find a satisfactory solution, so I decided to cook up my own and release it as an open source (LGPL) project, find it here.

sanity