ansaurus

Question

Multiple indexes for a Java Collection - most basic solution?

Answer 1

+6 A:

Each index will basically be a separate Map. You can (and probably should) abstract this behind a class that manages the searches, indexing, updates and removals for you. It wouldn't be hard to do this fairly generically. But no, there's no standard out of the box class for this although it can easily be built from the Java Collections classes.

cletus 2010-03-23 16:01:58

A Map is tricky though when the indexed values (the key of the map) can be double, e.g. are not unique. Then you'd need Maps with Lists inside. It's the same problem with TreeSets with custom Comparators which was what I was thinking of- in implementation also a TreeMap :)

extraneon 2010-03-23 16:10:56

Good answer. It wouldn't be too hard to create a library class with createIndex/destroyIndex where you pass createIndex a class that does nothing but implement equals and hash. You'd have to store an array of everything that's ever been added so that when someone called createIndex you could quickly add everything in your array into that new hash. destroyIndex would just delete the hash. Meh, doesn't seem worth it. I guess that's why it's never been done.

Bill K 2010-03-23 16:10:58

@cletus: Also not in any (small, common and well tested) library?

Chris Lercher 2010-03-23 16:11:39

@extraneon: The functionality "Multiple values for one key" is provided by Apache Commons Collections MultiMap http://commons.apache.org/collections/apidocs/org/apache/commons/collections/MultiMap.html

Chris Lercher 2010-03-23 16:15:53

The multiple values per key problem can be solved in a number of ways: 1. use a multimap type data structure (requires external library like Google Collections) 2. Store buckets (probably a List) at each value or 3. use an alternate structure like a sorted List.

cletus 2010-03-23 23:58:41

Sets (probably HashSets) are better than Lists for the buckets. You don't need order, and probably do need fast insertion, removal, and uniqueness checking.

TREE 2010-03-29 15:12:57

@cletus: You provided the first answer. At first I doubted it (can't be true ;-), but now I believe it.

Chris Lercher 2010-04-02 16:28:40

Answer 2

+5 A:

My first thought would be to create a class for the thing being indexed, then create multiple HashMap's to hold the indexes, with the same object added to each of the HashMaps. For an add, you'd then simply add the same object to each HashMap. A delete would require searching each HashMap for the reference to the destination object. If deletes need to be fast, you might want to create two HashMaps for each index: one for index-to-value and the other for value-to-index. Of course I'd wrap whatever you do in a class with a clearly-defined interface.

Doesn't seem like this would be hard. If you know the numbers and types of the indexes and the class of the widget up front, it would be pretty easy, like:

public class MultiIndex
{
  HashMap<String,Widget> index1=new HashMap<String,Widget>();
  HashMap<String,Widget> index2=new HashMap<String,Widget>();
  HashMap<Integer,Widget> index3=new HashMap<Integer,Widget>();

  public void add(String index1Value, String index2Value, Integer index3Value, Widget widget)
  {
    index1.put(index1Value, widget);
    index2.put(index2Value, widget);
    index3.put(index3Value, widget);
  }
  public void delete(Widget widget)
  {
    Iterator i=index1.keySet().iterator(); 
    while (i.hasNext())
    {
      String index1Value=(String)i.next();
      Widget gotWidget=(Widget) index1.get(index1Value);
      if (gotWidget.equals(widget))
        i.remove();
    }
    ... similarly for other indexes ...
  }
  public Widget getByIndex1(String index1Value)
  {
    return index1.get(index1Value);
  }
  ... similarly for other indexes ...

  }
}

If you want to make it generic and accept any object, have variable number and types of indexes, etc., it's a little more complicated, but not much.

Jay 2010-03-23 16:28:18

+1 for posting an example implementation. This is roughly, what it should do.

Chris Lercher 2010-03-28 19:56:11

Answer 3

+1 A:

I've written a Table interface that includes methods like

V put(R rowKey, C columnKey, V value) 
V get(Object rowKey, Object columnKey) 
Map<R,V> column(C columnKey) 
Set<C> columnKeySet()
Map<C,V> row(R rowKey)
Set<R> rowKeySet()
Set<Table.Cell<R,C,V>> cellSet()

We'd like to include it in a future Guava release, but I don't know when that would happen. http://code.google.com/p/guava-libraries/issues/detail?id=173

Jared Levy 2010-03-28 01:40:56

@Jared Levy: Thanks, good to hear, that Guava is considering this topic! I'm considering to switch from Apache Commons Collections to Guava :-) Can you expand a little bit on the usage of the table interface? At first glance, it looks more like it provides something similar to what a composite primary key for a database table does (where you have to get *both* parts of the key right to access the value)? Or does it also provide multiple indexes (where you choose *one* part of the key, and then only need to get that part right to access the value)?

Chris Lercher 2010-03-28 13:51:34

You can specify 0, 1, or 2 keys when accessing a Table.Unfortunately, it might be a while until we open-source it.

Jared Levy 2010-03-28 18:34:03

@Jared Levy: That's a nice data structure, which I believe will show its strength especially, when applied recursively (tables in tables)! It seems to be different from the structure I'm looking for in this question, though. Jay's example is what I'm basically looking for (multiple *unique* indexes; delete an element, and it's deleted from all indexes) But I want to find something like that in a common library, because lots of people seem to find it strikingly normal, that they have to re-implement the same thing over and over again.

Chris Lercher 2010-03-28 19:48:38

Answer 4

+1 A:

Google Collections LinkedListMultimap

About your first requirement

When a Value is removed, all index entries associated with that value must be removed.

I think There is neither a library nor a Helper that supports it.

Here is how i have done by using LinkedListMultimap

Multimap<Integer, String> multimap = LinkedListMultimap.create();

// Three duplicates entries
multimap.put(1, "A");
multimap.put(2, "B");
multimap.put(1, "A");
multimap.put(4, "C");
multimap.put(1, "A");

System.out.println(multimap.size()); // outputs 5

To get your first requirement, a Helper can play a good job

public static <K, V> void removeAllIndexEntriesAssociatedWith(Multimap<K, V> multimap, V value) {
    Collection<Map.Entry<K, V>> eCollection = multimap.entries();
    for (Map.Entry<K, V> entry : eCollection)
        if(entry.getValue().equals(value))
            eCollection.remove(entry);
}

...

removeAllIndexEntriesAssociatedWith(multimap, "A");

System.out.println(multimap.size()); // outputs 2

Google collections is

lightweight
Supported by Joshua Block (Effective Java)
Nice features as ImmutableList, ImmutableMap and so on

Arthur Ronald F D Garcia 2010-03-29 20:03:01

Answer 5

+3 A:

You have a lot of really constrictive requirements are appear to be very particular to your needs. Most of the things you are saying aren't viable are because a lot so of people have the same exact needs which basically defines a basic database engine. That is why they are "large" libraries. You say "no database" but at its core every indexing system is a "database" of terms and documents. I would argue that a Collection is a "database". I would say take a look at Space4J.

I would say if you don't find what you are looking for, start a project on GitHub and get on with coding it yourself and sharing the results.

fuzzy lollipop 2010-03-29 20:24:01

That's a very good point: I need part of the functionality of a database in memory! I don't need ACID, I don't even need multiple tables, I don't need to access it using a comfortable query language. But I do need one piece of it: Multiple unique indexes. It's possible, that Space4J is currently the best answer to that. I must admit, that I haven't heard of that project before - +1 for the link! However, I do think, that the problem is very common, just as common as invoking a "CREATE UNIQUE INDEX ... ON ..." in SQL.

Chris Lercher 2010-03-29 21:07:39

Answer 6

A:

I'm not sure I understand the question, but I think what you're asking for is multiple ways to map from different, unique keys to values and appropriate clean-up when a value goes away.

I see that you don't want to roll your own, but there's a simple enough composition of map and multimap (I used the Guava multimap below, but the Apache one should work as well) to do what you want. I have a quick and dirty solution below (skipped the constructors, since that depends on what sort of underlying map/multimap you want to use):

package edu.cap10.common.collect;

import java.util.Collection;
import java.util.Map;

import com.google.common.collect.ForwardingMap;
import com.google.common.collect.Multimap;

public class MIndexLookupMap<T> extends ForwardingMap<Object,T>{

    Map<Object,T> delegate;
    Multimap<T,Object> reverse;

    @Override protected Map<Object, T> delegate() { return delegate; }

    @Override public void clear() {
        delegate.clear();
        reverse.clear();
    }

    @Override public boolean containsValue(Object value) { return reverse.containsKey(value); }

    @Override public T put(Object key, T value) {
        if (containsKey(key) && !get(key).equals(value)) reverse.remove(get(key), key); 
        reverse.put(value, key);
        return delegate.put(key, value);
    }

    @Override public void putAll(Map<? extends Object, ? extends T> m) {
        for (Entry<? extends Object,? extends T> e : m.entrySet()) put(e.getKey(),e.getValue());
    }

    public T remove(Object key) {
        T result = delegate.remove(key);
        reverse.remove(result, key);
        return result;
    }

    public void removeValue(T value) {
        for (Object key : reverse.removeAll(value)) delegate.remove(key);
    }

    public Collection<T> values() {
        return reverse.keySet();
    }   

}

removal is O(number of keys), but everything else is the same order as a typical map implementation (some extra constant scaling, since you also have to add things to the reverse).

I just used Object keys (should be fine with appropriate implementations of equals() and hashCode() and key distinction) - but you could also have a more specific type of key.

Carl 2010-03-29 21:30:49

note, this does not do multiple values per key, but could straightforwardly enough with forward being a multimap, and adding a get(Object...keys) method which also did some collection intersection operations.

Carl 2010-03-29 21:35:15

Answer 7

+1 A:

Use Prefuse Tables. They support as many indices as you want, are fast (indices are TreeMaps), and have nice filtering options (boolean filters? no problem!). No database required, tested with large data-sets in many information visualization applications.

In their raw form, they are not as convenient as standard containers (you need to deal with rows and columns), but you can surely write a small wrapper around that. Plus, they plug nicely into UI components such as Swing's JTables.

tucuxi 2010-03-30 22:03:10

Answer 8

+1 A:

Your main goal seems to be that you'll remove the object from all indexes when you remove it from one.

The simplest approach will be to add another layer of indirection: you store your actual object in a Map<Long,Value>, and use a bidirectional map (which you'll find in Jakarta Commons and probably Google Code) for your indexes as Map<Key,Long>. When you remove an entry from a particular index, you'll take the Long value from that index and use it to remove the corresponding entries from the main map and the other indexes.

One alternative to the BIDIMap is to define your "index" maps as Map<Key,WeakReference<Long>>; however, this will require you to implement a ReferenceQueue for cleanup.

Another alternative is to create a key object that can take an arbitrary tuple, define its equals() method to match on any element in the tuple, and use that with a TreeMap. You can't use a HashMap, because you won't be able to compute a hashcode based on just one element of the tuple.

public class MultiKey
implements Comparable<Object>
{
   private Comparable<?>[] _keys;
   private Comparable _matchKey;
   private int _matchPosition;

   /**
    *  This constructor is for inserting values into the map.
    */
   public MultiKey(Comparable<?>... keys)
   {
      // yes, this is making the object dependent on externally-changable
      // data; if you're paranoid, copy the array
      _keys = keys;
   }


   /**
    *  This constructor is for map probes.
    */
   public MultiKey(Comparable key, int position)
   {
      _matchKey = key;
      _matchPosition = position;
   }


   @Override
   public boolean equals(Object obj)
   {
      // verify that obj != null and is castable to MultiKey
      if (_keys != null)
      {
         // check every element
      }
      else
      {
         // check single element
      }
   }


   public int compareTo(Object o)
   {
      // follow same pattern as equals()
   }
}

Anon 2010-04-02 15:15:36

These are good points about things to consider when writing an implementation (+1).

Chris Lercher 2010-04-02 15:29:15

ansaurus

tags:

views:

answers:

Multiple indexes for a Java Collection - most basic solution?

Update

related questions