views:

1000

answers:

4

I've been reading up on so-called "data grid" solutions for the Java platform including Terracotta, GigaSpaces and Coherence. I was wondering if anyone has real-world experience working any of these tools and could share their experience. I'm also really curious to know what scale of deployment people have worked with: are we talking 2-4 node clusters or have you worked with anything significantly larger than that?

I'm attracted to Terracotta because of its "drop in" support for Hibernate and Spring, both of which we use heavily. I also like the idea of how it decorates bytecode based on configuration and doesn't require you to program against a "grid API." I'm not aware of any advantages to tools which use the approach of an explicit API but would love to hear about them if they do in fact exist. :)

I've also spent time reading about memcached but am more interested in hearing feedback on these three specific solutions. I would be curious to hear how they measure up against memcached in the event someone has used both.

+2  A: 

I don't have enough experience with these technologies, but I think Apache Hadoop is proved to be scalable and reliable. Yahoo ran it on 10,000 core Linux cluster.

It's based on Google MapReduce algorithm.

This article describes MapReduce and why you should care about it.

Bahaa Zaid
+6  A: 

We had a 50 servers running a webservice application and all these servers were load balanced using bigIP. The requirement was to cache each user state so that subsequent states don't do the same processing again and get the data from previous state. This way the client of the webservice don't need to maintain state.

We used Terracotta to cache the states and never faced any performance issue. At peak times number of request application is getting is 100 per second.

Bhushan
+2  A: 

The library you choose really depends on your application and what you're trying to achieve.

I worked for a shop that used Coherence to provide scalability (and redundancy, sort of) for it's web applications. We found that you have to have around 4-5 nodes to start getting any benefits from Coherence (2 or 3 nodes potentially reduces performance). I believe Oracle's docs say you need lots (30+) nodes to really get a benefit with Coherence. If you do go with Coherence, make sure you get your hardware set up properly - it is very sensitive to latency.

I personally would stay away from a "drop-in" stuff. They might give you something to start with, but you'll eventually run into synchronization or performance problems and will have to start writing code specific to your grid layer anyway. Basically, you know your app better than the library, and will be able to figure out which items need to be in cache, how long they need to live, how your app will be used, etc.

Seth
+4  A: 

You may want to check out Hazelcast also. Hazelcast is an open source transactional, distributed/partitioned implementation of queue, topic, map, set, list, lock and executor service. It is super easy to work with; just add hazelcast.jar into your classpath and start coding. Almost no configuration is required.

Hazelcast is released under Apache license and enterprise grade support is also available. Code is hosted at Google Code.

Talip Ozturk