views:

135

answers:

4

I have a requirement to create a java cache which holds all the cities and airports. So, if i query the cache for a location, lets say a city, it should return all the airports in that city and if I query a location which is an airport, i should get back that airport. Also, each location has to be stored as a byte array in cache.(as the exposed interface for querying the cache has byte[] as the parameter for location) Other considerations are:

  1. The retrieval has to be very fast, as fast as possible
  2. The cache is loaded only once at system startup.It doesn't change after getting loaded.
  3. As its loaded only once, we can keep it sorted if that speeds up the retrieval.

What I have got so far:

Approach 1

Create a thin wrapper over byte[] array, lets say ByteWrapper. Put each location(both airports and cities) as a key in map(TreeMap?). Use lists of ByteWrapper(containing airports where ever applicable) as values.

Approach 2

Create multi dimensional byte[] array which is sorted on location. Its essentially a map. Then use binary search to locate the key and return results.

What approach would you suggest? Please let me know in case you have better ideas Thanks

A: 

You do not need Byte Arrays, Strings would be just fine.

How often you would add items to this cache? I am guessing it is completely static, since they are not making new cities or airports every day.

So, what you can do is using two MultiHashMaps, one keying on the city and another keying on airports. Checkout Google Multimap http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multimap.html

If you are using mySQL by any chance, you can actually use a table based on Memory Storage Engine.

Many Databases can pin a table in memory, definitely Oracle can, so that is another way to go.

srini.venigalla
Thanks for the reply. As i said, I have to use byte arrays as thats how the cache would be queried. The interface cant be changed. Yes, I can store it as Strings but that would involve an overhead of conversion between strings and bytes.No, I cant use DB because of performance over heads.
Tequila Guy
+1  A: 

The fact that the exposed API is byte[] based shouldn't necessarily dictate the internal details of your cache.

Second observation is that this is not a generalized data structure problem. Both the space of all airports and space of all cities are finite, and well known. (You even know the size).

Hash maps, trees, etc. are all algorithms that guarantee certain performance characteristics within established bounds for general usage.

Since data integrity is a non-issue ("data doesn't change") and if space considerations are not critical ("as fast as possible"), then why not:

[Edit: this bit somehow cut lost in the cut and paste: You index (number) your cities and airports, given that you know these sets and they are effectively static.]

// these need to get initialized on startup
// this initialization can be optimized.

Map<byte[], Long> airportIndexes = new HashMap<byte[], Long>(NUMBER_OF_AIRPORTS);
Map<byte[], Long> citiesIndexes = new HashMap<byte[], Long>(NUMBER_OF_CITIES);

Map<Long, byte[]> airports = new HashMap<Long, byte[]>(NUMBER_OF_AIRPORTS);
Map<Long, byte[]> cities = new HashMap<Long, byte[]>(NUMBER_OF_CITIES);

long[][] airportToCitiesMappings = new byte[NUMBER_OF_AIRPORTS][];
long[][] citiesToAirportMappings = new byte[NUMBER_OF_CITIES][];


public List<byte[]> getCitiesNearAirport(byte[] airportName) {
   Long[] cityIndexes = getCitiesByIdxNearAirport(airportName);
   List<byte[]> cities = new ArrayList<byte[]>(cityIndexes.length);
   for(Long cityIdx : cityIndexes) {
       cities.add(cities.get(cityIdx));
   }
   return cities;
}
public long[] getCitiesByIdxNearAirport(Long airportIdx) {
   return airportToCitiesMappings[airportIdx];
}
public long[] getCitiesNearAirport(byte[] airportName) {
   return getCitiesNearAirport(airportIndexes.get(airportName));
}
public long[] getCitiesNearAirport(Long airportIdx) {
   return airportToCitiesMappings[airportIdx];
}
// .. repeat above pattern for airports.

That should give you O(1) time performance characteristics. There is considerable redundancy in terms of space.

Thanks. Few problems with the approach1) airportIndexes Map will always return null as hashMap will not consider 2 byte arrays equal if they have same values.2) Conversions between Long and long etc.I am thinking that I can create just 1 multi-dimensional array which has cities+airports as dimension 1. We dont care if the input is airport or city...we just need to return the corresponding mapping. So if input is City, return all airports in that city and if input is airport, just return that airport. In that case, we can avoid separate search for cities and airports. Any ideas?
Tequila Guy
A: 

Hello,

Give a try to approach 1, as byte[] is an Object type you could use something like:

Map<byte[], List<byte[]>> cache = ...

It's probably the simplest approach, you just will have to choose the implementation for your Map. Probably you should go with a HashMap because it's the simplest...

As gustavc said using a HashMap won't work, so you could instead use a TreeMap with a given comparator:

Map<byte[], List<byte[]>> m = new TreeMap<byte[], List<byte[]>>(new Comparator<byte[]>() {
    public int compare(byte[] o1, byte[] o2) {
        int result = (o1.length < o2.length ? -1 : (o1.length == o2.length ? 0 : 1));
        int index = 0;
        while (result == 0 && index < o1.length) {
            result = (o1[index] < o2[index] ? -1 : (o1[index] == o2[index] ? 0 : 1));
            index++;
        }
        return result;
    }
});
pgras
The hash codes of arrays are based on the identity of the array object, not array contents. The following wouldn't work: `byte[] a = {1}, b = {1}; map.put(a, someValue); assert map.get(b) == map.get(a);`
gustafc
gustavc: thanks for your explanation, I had missed that...
pgras
A: 

Hi. SO this is what I have done so far:

private static byte[][][] cache = null; // this is the actual cache
// this map has ByteArrayWrapper(a wrapper over byte[]) as key which
//  can be an airport or city and index of corresponding 
// airport/airports in byte[][][]cache as value
Map<ByteArrayWrapper, Integer> byteLocationIndexes = null;
/**
* This is how cache is queried. You can pass an airport or city as a location parameter
* It will fetch the corresponding airport/airports
*/
private byte[][] getAllAirportsForLocation(ByteArrayWrapper location) {
    byte[][] airports = null;
    airports = byteLocationIndexes.get(location)== null ? null : cache[byteLocationIndexes.get(location).intValue()];
    return airports;
}

I bench marked performance using both String as key in indexMap (and using String[][] cache)and ByteArrayWrapper as key (and byte[] as cache). There is a 15-20% improvement if I use ByteArrayWrapper and byte[][][] cache.

What else can be done to improve the performance? Would it help if I use some other implementation of Map? As the cache is loaded only once and never changes, it can be sorted. Most time is taken in key look up in byteLocationIndexes and that is the bottle neck. I am already calculating hashCode at the time of object creation and storing it as a local variable in ByteArrayWrapper.

Any suggestions?

Tequila Guy