views:

631

answers:

8

Hi,

I'm looking for the most basic solution to create multiple indexes on a Java Collection.

Required functionality:

  • When a Value is removed, all index entries associated with that value must be removed.
  • Index lookup must be faster than linear search (at least as fast as a TreeMap).

Side conditions:

  • No dependencies on large (like Lucene) libraries. No uncommon or not well tested libraries. No database.
  • A library like Apache Commons Collections etc. would be ok.
  • Even better, if it works with JavaSE (6.0) alone.
  • Edit: No self-implemented solution (thanks for the answers suggesting this - it's good to have them here for completeness, but I already have a solution very similar to Jay's) Whenever several people find out, that they implemented the same thing, this should be part of some common library.

Of course, I could write a class that manages multiple Maps myself (that's not hard, but it feels like reinventing the wheel). So I'd like to know, if it can be done without - while still getting a simple usage similar to using a single indexed java.util.Map.

Thanks, Chris

Update

It looks very much as if we haven't found anything. I like all your answers - the self developed versions, the links to database-like libraries.

Here's what I really want: To have the functionality in (a) Apache Commons Collections or (b) in Google Collections/Guava. Or maybe a very good alternative.

Do other people miss this functionality in these libraries, too? They do provide all sorts of things like MultiMaps, MulitKeyMaps, BidiMaps, ... I feel, it would fit in those libraries nicely - it could be called MultiIndexMap. What do you think?

+6  A: 

Each index will basically be a separate Map. You can (and probably should) abstract this behind a class that manages the searches, indexing, updates and removals for you. It wouldn't be hard to do this fairly generically. But no, there's no standard out of the box class for this although it can easily be built from the Java Collections classes.

cletus
A Map is tricky though when the indexed values (the key of the map) can be double, e.g. are not unique. Then you'd need Maps with Lists inside. It's the same problem with TreeSets with custom Comparators which was what I was thinking of- in implementation also a TreeMap :)
extraneon
Good answer. It wouldn't be too hard to create a library class with createIndex/destroyIndex where you pass createIndex a class that does nothing but implement equals and hash. You'd have to store an array of everything that's ever been added so that when someone called createIndex you could quickly add everything in your array into that new hash. destroyIndex would just delete the hash. Meh, doesn't seem worth it. I guess that's why it's never been done.
Bill K
@cletus: Also not in any (small, common and well tested) library?
Chris Lercher
@extraneon: The functionality "Multiple values for one key" is provided by Apache Commons Collections MultiMap http://commons.apache.org/collections/apidocs/org/apache/commons/collections/MultiMap.html
Chris Lercher
The multiple values per key problem can be solved in a number of ways: 1. use a multimap type data structure (requires external library like Google Collections) 2. Store buckets (probably a List) at each value or 3. use an alternate structure like a sorted List.
cletus
Sets (probably HashSets) are better than Lists for the buckets. You don't need order, and probably do need fast insertion, removal, and uniqueness checking.
TREE
@cletus: You provided the first answer. At first I doubted it (can't be true ;-), but now I believe it.
Chris Lercher
+5  A: 

My first thought would be to create a class for the thing being indexed, then create multiple HashMap's to hold the indexes, with the same object added to each of the HashMaps. For an add, you'd then simply add the same object to each HashMap. A delete would require searching each HashMap for the reference to the destination object. If deletes need to be fast, you might want to create two HashMaps for each index: one for index-to-value and the other for value-to-index. Of course I'd wrap whatever you do in a class with a clearly-defined interface.

Doesn't seem like this would be hard. If you know the numbers and types of the indexes and the class of the widget up front, it would be pretty easy, like:

public class MultiIndex
{
  HashMap<String,Widget> index1=new HashMap<String,Widget>();
  HashMap<String,Widget> index2=new HashMap<String,Widget>();
  HashMap<Integer,Widget> index3=new HashMap<Integer,Widget>();

  public void add(String index1Value, String index2Value, Integer index3Value, Widget widget)
  {
    index1.put(index1Value, widget);
    index2.put(index2Value, widget);
    index3.put(index3Value, widget);
  }
  public void delete(Widget widget)
  {
    Iterator i=index1.keySet().iterator(); 
    while (i.hasNext())
    {
      String index1Value=(String)i.next();
      Widget gotWidget=(Widget) index1.get(index1Value);
      if (gotWidget.equals(widget))
        i.remove();
    }
    ... similarly for other indexes ...
  }
  public Widget getByIndex1(String index1Value)
  {
    return index1.get(index1Value);
  }
  ... similarly for other indexes ...

  }
}

If you want to make it generic and accept any object, have variable number and types of indexes, etc., it's a little more complicated, but not much.

Jay
+1 for posting an example implementation. This is roughly, what it should do.
Chris Lercher
+1  A: 

I've written a Table interface that includes methods like

V put(R rowKey, C columnKey, V value) 
V get(Object rowKey, Object columnKey) 
Map<R,V> column(C columnKey) 
Set<C> columnKeySet()
Map<C,V> row(R rowKey)
Set<R> rowKeySet()
Set<Table.Cell<R,C,V>> cellSet()

We'd like to include it in a future Guava release, but I don't know when that would happen. http://code.google.com/p/guava-libraries/issues/detail?id=173

Jared Levy
@Jared Levy: Thanks, good to hear, that Guava is considering this topic! I'm considering to switch from Apache Commons Collections to Guava :-) Can you expand a little bit on the usage of the table interface? At first glance, it looks more like it provides something similar to what a composite primary key for a database table does (where you have to get *both* parts of the key right to access the value)? Or does it also provide multiple indexes (where you choose *one* part of the key, and then only need to get that part right to access the value)?
Chris Lercher
You can specify 0, 1, or 2 keys when accessing a Table.Unfortunately, it might be a while until we open-source it.
Jared Levy
@Jared Levy: That's a nice data structure, which I believe will show its strength especially, when applied recursively (tables in tables)! It seems to be different from the structure I'm looking for in this question, though. Jay's example is what I'm basically looking for (multiple *unique* indexes; delete an element, and it's deleted from all indexes) But I want to find something like that in a common library, because lots of people seem to find it strikingly normal, that they have to re-implement the same thing over and over again.
Chris Lercher
+1  A: 

Google Collections LinkedListMultimap

About your first requirement

  • When a Value is removed, all index entries associated with that value must be removed.

I think There is neither a library nor a Helper that supports it.

Here is how i have done by using LinkedListMultimap

Multimap<Integer, String> multimap = LinkedListMultimap.create();

// Three duplicates entries
multimap.put(1, "A");
multimap.put(2, "B");
multimap.put(1, "A");
multimap.put(4, "C");
multimap.put(1, "A");

System.out.println(multimap.size()); // outputs 5

To get your first requirement, a Helper can play a good job

public static <K, V> void removeAllIndexEntriesAssociatedWith(Multimap<K, V> multimap, V value) {
    Collection<Map.Entry<K, V>> eCollection = multimap.entries();
    for (Map.Entry<K, V> entry : eCollection)
        if(entry.getValue().equals(value))
            eCollection.remove(entry);
}

...

removeAllIndexEntriesAssociatedWith(multimap, "A");

System.out.println(multimap.size()); // outputs 2

Google collections is

  • lightweight
  • Supported by Joshua Block (Effective Java)
  • Nice features as ImmutableList, ImmutableMap and so on
Arthur Ronald F D Garcia
+3  A: 

You have a lot of really constrictive requirements are appear to be very particular to your needs. Most of the things you are saying aren't viable are because a lot so of people have the same exact needs which basically defines a basic database engine. That is why they are "large" libraries. You say "no database" but at its core every indexing system is a "database" of terms and documents. I would argue that a Collection is a "database". I would say take a look at Space4J.

I would say if you don't find what you are looking for, start a project on GitHub and get on with coding it yourself and sharing the results.

fuzzy lollipop
That's a very good point: I need part of the functionality of a database in memory! I don't need ACID, I don't even need multiple tables, I don't need to access it using a comfortable query language. But I do need one piece of it: Multiple unique indexes. It's possible, that Space4J is currently the best answer to that. I must admit, that I haven't heard of that project before - +1 for the link! However, I do think, that the problem is very common, just as common as invoking a "CREATE UNIQUE INDEX ... ON ..." in SQL.
Chris Lercher
A: 

I'm not sure I understand the question, but I think what you're asking for is multiple ways to map from different, unique keys to values and appropriate clean-up when a value goes away.

I see that you don't want to roll your own, but there's a simple enough composition of map and multimap (I used the Guava multimap below, but the Apache one should work as well) to do what you want. I have a quick and dirty solution below (skipped the constructors, since that depends on what sort of underlying map/multimap you want to use):

package edu.cap10.common.collect;

import java.util.Collection;
import java.util.Map;

import com.google.common.collect.ForwardingMap;
import com.google.common.collect.Multimap;

public class MIndexLookupMap<T> extends ForwardingMap<Object,T>{

    Map<Object,T> delegate;
    Multimap<T,Object> reverse;

    @Override protected Map<Object, T> delegate() { return delegate; }

    @Override public void clear() {
        delegate.clear();
        reverse.clear();
    }

    @Override public boolean containsValue(Object value) { return reverse.containsKey(value); }

    @Override public T put(Object key, T value) {
        if (containsKey(key) && !get(key).equals(value)) reverse.remove(get(key), key); 
        reverse.put(value, key);
        return delegate.put(key, value);
    }

    @Override public void putAll(Map<? extends Object, ? extends T> m) {
        for (Entry<? extends Object,? extends T> e : m.entrySet()) put(e.getKey(),e.getValue());
    }

    public T remove(Object key) {
        T result = delegate.remove(key);
        reverse.remove(result, key);
        return result;
    }

    public void removeValue(T value) {
        for (Object key : reverse.removeAll(value)) delegate.remove(key);
    }

    public Collection<T> values() {
        return reverse.keySet();
    }   

}

removal is O(number of keys), but everything else is the same order as a typical map implementation (some extra constant scaling, since you also have to add things to the reverse).

I just used Object keys (should be fine with appropriate implementations of equals() and hashCode() and key distinction) - but you could also have a more specific type of key.

Carl
note, this does not do multiple values per key, but could straightforwardly enough with forward being a multimap, and adding a get(Object...keys) method which also did some collection intersection operations.
Carl
+1  A: 

Use Prefuse Tables. They support as many indices as you want, are fast (indices are TreeMaps), and have nice filtering options (boolean filters? no problem!). No database required, tested with large data-sets in many information visualization applications.

In their raw form, they are not as convenient as standard containers (you need to deal with rows and columns), but you can surely write a small wrapper around that. Plus, they plug nicely into UI components such as Swing's JTables.

tucuxi
+1  A: 

Your main goal seems to be that you'll remove the object from all indexes when you remove it from one.

The simplest approach will be to add another layer of indirection: you store your actual object in a Map<Long,Value>, and use a bidirectional map (which you'll find in Jakarta Commons and probably Google Code) for your indexes as Map<Key,Long>. When you remove an entry from a particular index, you'll take the Long value from that index and use it to remove the corresponding entries from the main map and the other indexes.

One alternative to the BIDIMap is to define your "index" maps as Map<Key,WeakReference<Long>>; however, this will require you to implement a ReferenceQueue for cleanup.


Another alternative is to create a key object that can take an arbitrary tuple, define its equals() method to match on any element in the tuple, and use that with a TreeMap. You can't use a HashMap, because you won't be able to compute a hashcode based on just one element of the tuple.

public class MultiKey
implements Comparable<Object>
{
   private Comparable<?>[] _keys;
   private Comparable _matchKey;
   private int _matchPosition;

   /**
    *  This constructor is for inserting values into the map.
    */
   public MultiKey(Comparable<?>... keys)
   {
      // yes, this is making the object dependent on externally-changable
      // data; if you're paranoid, copy the array
      _keys = keys;
   }


   /**
    *  This constructor is for map probes.
    */
   public MultiKey(Comparable key, int position)
   {
      _matchKey = key;
      _matchPosition = position;
   }


   @Override
   public boolean equals(Object obj)
   {
      // verify that obj != null and is castable to MultiKey
      if (_keys != null)
      {
         // check every element
      }
      else
      {
         // check single element
      }
   }


   public int compareTo(Object o)
   {
      // follow same pattern as equals()
   }
}
Anon
These are good points about things to consider when writing an implementation (+1).
Chris Lercher