tags:

views:

323

answers:

8

I'm writing something which will receive quite a number of transactions per second. For every transaction that comes in, a reference is made to a map which key values are the id and a bean which will help process that particular transaction. Basically each transaction comes in bearing an id, a look up will be done to the map to retrieve the corresponding bean for processing. The sticky part comes with the fact that the id for each transaction is not meant to match the id in the map precisely. More of a begins with operation. To that end, instead of using a string as an id, I've created a simple pojo called MyId. Codes below:

public class MyId
{

 private static final int HASHCODE_CONSTANT = 1;
 private String value;

 public MyId(String value)
 {
  this.value = value;
 }

 @Override
 public int hashCode()
 {
  //Returns the same hashcode value for all instances of this pojo
  return HASHCODE_CONSTANT;
 }

 @Override
 public boolean equals(Object obj)
 {
  //Checks for object type, forcibly casts and then compares the starts with
  if(obj instanceof MyId)
  {
   if(!(obj == null || "".equals(obj)))
   {
    return this.value.startsWith(((MyId)obj).getValue());
   }
  }
  return false;
 }

 public String getValue()
 {
  return value;
 }

 public void setValue(String value)
 {
  this.value = value;
 }

 //Test
 public static void main(String[] args)
 {
   Map map = new HashMap();
   map.put(new MyId("123456"), "");

   System.out.println("Result: " + map.containsKey(new MyId("12345677")));
   System.out.println("Result: " + map.containsKey(new MyId("11234567")));
 }
}

First test returns true and second test returns false like its supposed to. It seems that the map.containsKey() method invokes and compares your object's hashcode method first before the equals() is ever called. If your hashes don't match, it won't even bother to compare. While this works, it feels a little dodgy having to implement the hashcode method in this manner to trick the map.

Was wondering if there's a more efficient way to do this. We are dealing with quite a number of transactions/second and hence quite a number of look up on the map.

PS: I coded this blind so I'm sure there are syntax errors. Please ignore those. Just trying to convey the general idea.

+5  A: 

If your hashCode() method returns a constant value all your keys will hash to the same bucket in the HashMap, effectively reducing your HashMap to be a linked list, with access time O(n) (instead of approximating O(1)).

One possible solution (not space-efficient): For each string store multiple keys corresponding to the possible String prefices, but all referencing the same value. For example, for the word "Hello" you would store keys "H", "He", "Hel", "Hell", "Hello". This would obviously consume more space but look-up time would be very fast and you would not need to botch the class's equals() method to perform a "fuzzy" comparison. You could improve space efficiency by writing a custom class; e.g.

/**
 * Class representing String prefix.
 * Storage overhead == original string + two ints.
 */
public class Prefix {
  private final String str;
  private final int len;
  private final int hc;

  public Prefix(String str, int len) {
    this.str = str;
    this.len = len;
    this.hc = toString().hashCode(); // Precompute and store hash code.
  }

  public String toString() {
    return str.substring(0, len);
  }

  public int hashCode() {
    return hc;
  }

  public boolean equals(Object o) {
    boolean ret;

    if (this == o) {
      ret = true;
    } else if (o instanceof Prefix) {
      ret = toString().equals(((Prefix)o).toString());
    } else {
      ret = false;
    }

    return ret;
  }
}
Adamski
+1 a much better description than I was writting
Gareth Davis
Hmm I did consider this alternative but:a. The map size is already considerableb. The possible key number of combination is theoretically innumerable
Michael
you precompute and store the hashCode, but you never use it
dfa
@dfa: Thanks, I've amended my code.
Adamski
Good answer but equals() should return false if o is not a Prefix... Also String.equals() will actually return false if o is not a String so with your class the code "Prefix p = new Prefix("Foo", 3);System.err.println(p.equals(p));" will output false which I would call a bug. Fix that and I give you your +1.
Fredrik
@Fredrik - Thanks - good spot; Amended the code. In my defense this code was written pre-breakfast :-)
Adamski
A: 

I think you are forcing two different objects to use the same data structure and that makes your map not that efficient.

To offer a better solution I might need more information, such as: is the id in the map always 6 digits?

OK then you can for example create two classes like this.

public class MyIdMap {

   private String value;

   public MyIdMap(String value) {
      this.value = value;
   }

   public String getValue() {
      return value;
   }

   public void setValue(String value) {
      this.value = value;
   }

   @Override
   public int hashCode() {
      final int prime = 31;
      int result = 1;
      result = prime * result + ((value == null) ? 0 : value.hashCode());
      return result;
   }

   @Override
   public boolean equals(Object obj) {
      if (this == obj)
         return true;
      if (obj == null)
         return false;
      if (getClass() != obj.getClass())
         return false;
      MyIdMap other = (MyIdMap) obj;
      if (value == null) {
         if (other.value != null)
            return false;
      } else if (!value.equals(other.value))
         return false;
      return true;
   }
}


public class MyId {

   private String value;

   public MyId(String value) {
      this.value = value;
   }

   public String getValue() {
      return value;
   }

   public void setValue(String value) {
      this.value = value;
   }

   public MyIdMap getMyIDMap() {
      return new MyIdMap(value.substring(0, 6));
   }
}

Put the MyIdMap in a Map, then when you're looking for it, just use map.get(myId.getMyIdMap())

nanda
Lets assume for arguments sake that the ids in the map will always be of a certain size. How can we use that ?
Michael
+1  A: 

Why do you use HashMap in such inefficient way. The same thing you can get much faster using TreeMap - it exactly done what you want. Also const in hash code will show O(n) performance, while TreeMap gives you ln(n).

Dewfy
+1  A: 

this object doesn't even follow the general contract of hashCode:

  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.

However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

you maybe want to test your implementation (a stub that returns always a constant) and a "normal" Object, like a String. Please test, test, test, think, test, test, test, ...

dfa
How does it not follow the contract? I agree that generating a constant hashcode is very suboptimal, but the hashCode contract specifically allows non-equal objects to have the same hashCode (see the second point you quoted).
Laurence Gonsalves
but the first point is not satisfied, right?
dfa
The first point is satisfied. He's returning a constant, so any two MyId objects will have the same hashCode. Therefore, any two equal MyId objects will have the same hashCode.
Laurence Gonsalves
+3  A: 

If your comparator is using startsWith(), then a hash map is the wrong data structure. You need something where you can quickly find keys by their first letters: You need a tree map.

Unlike a hash map, a tree map is ordered. So instead of blindly diving into a mathematical space of oddly distributed numbers, you can start searching at the root and performance will be O(log(n)). The main problem with the Java implementation: It's closed and locked. You can't really extend it to search with startsWith().

In your case, the number of transaction processors seems to be stable (meaning that you don't create new ones all the time). If that isn't the case, then the number of processors should be relatively small (say, < 1000).

My suggestion is to use an array and put all processors in that array. Sort them by their ID.

Now you can use Arrays.binarySearch(T[] a, T key, Comparator<? super T> c) to look up elements efficiently using the code from equals() in the comparator.

Aaron Digulla
+1  A: 

Your equals() method doesn't obey the contract of Object.equals() - it is not transitive. It would have "a".equals("ab") return true, and "a".equals("ac") return true, but "ab".equals("ac") return false.

If you are trying to store string-related objects based on string prefixes, you might want to look into using some sort of trie.

Avi
+3  A: 

I don't think that hash tables are a good solution. @Adamskis idea of loading the hashtable with prefixes is interesting, but I think it will get messy if keys share prefixes or if you need insert / delete entries on the fly.

If your map / lookup table entries don't chjange, then using a presorted array and Arrays.binarySearch(...) (suggested by @Aaron) is a good solution. It should give you O(log(N)) lookup.

However, if you need to insert or remove map entries on the fly, these operations will be O(N) for an array-based solution. Instead, you should use a TreeMap, and use the methods in the NavigableMap API such as 'lowerKey(), floorKey() and higherKey()` to find the "closest" match in the table. That should give you O(log(N)) for lookup, insertion and deletion.

Stephen C
A: 

Ok thanks for the input fellas. Think one of the biggest factors in the problem statement is that the stored keys would almost always be shorter than the comparison. To that end, came up with 2 different approaches which will solve the problem statement just in case someone needs a reference if they come across something similar in the future:

  1. Use a Map as per normal. When the input comparison comes in, compare. If there is no hit, trim the string and compare again.

  2. This one is a little fancier. Quite liked what I read about Don Knuth's Trie (thanks for the ref Avi) and came up with a very simple implementation. (Just FYI, the format of the Ids would be something like 1.1.1.2. Need to keep that in mind so the sample code doesn't look too weird).

public class Trie { private HashMap map = new HashMap();

public Trie()
{
}

public Object get(String key)
{
    return recurse(key.split("\\."), map, 0);
}

protected Object recurse(String[] key, Map map, int location)
{
    Object value = map.get(key[location]);
    if(value instanceof Map)
        return recurse(key, (Map)value, location+1);
    else
        return value;
}

public void addKey(String key, Object value)
{
    String[] keys = key.split("\\.");
    addKey(keys, map, 0, value);
}

protected void addKey(String[] key, Map map, int location, Object value)
{
    if((location+1) == key.length)
    {
        //end of the road. value insertion
        map.put(key[location], value);
    }
    else
    {
        Map hashMap = (Map) map.get(key[location]);
        if(!(map.containsKey(key[location])))
        {
            hashMap = new HashMap();
            map.put(key[location], hashMap);
        }
        addKey(key, hashMap, location+1, value);
    }
}

public static void main(String[] args)
{
    Trie trie = new Trie();
    trie.addKey("1.1.2.1", "1.1.2.1");
    trie.addKey("1.1.2.2", "1.1.2.2");
    trie.addKey("1.1.2.3.1", "1.1.2.3.1");
    trie.addKey("1.1.2.3.2", "1.1.2.3.2");
    trie.addKey("1.1.2.4", "1.1.2.4");

    System.out.println(trie.get("1.1.2.1.0")); //returns 1.1.2.1
    System.out.println(trie.get("1.1.2.3.1.0")); //returns 1.1.2.3.1
    System.out.println(trie.get("1.1.2.4.1.0")); //returns 1.1.2.4
}

}

In my use case, I don't anticipate the Trie to grow more than 2-3 levels in depth so if your tree structure gets very complex you might want to analyze performance issues and see if the additional lookups will cause too much overheads. Oh and both approaches do not require any dodgy changes to the hashCode or equals contract since we're dealing with String objects only.

Considerations:

Haven't decided on which one to use pending behavior analysis. The thing is most of the time, the comparison values will be exactly that of those stored in the map so a simple look up will be sufficient. Its just the other "special" cases that needs to be catered for. In summary if the special occurrences tend to be very very low frequency, I'd be tempted to go for the initial approache (#1). The vast majority of the searches will be quick and when a special case comes in I'll live with the pain of string manipulation overhead. If the reverse is true, #2 might be more attractive.

PS: Comments welcome

Michael