views:

2945

answers:

10

This is the ability to run your application on a cluster of servers with the intent to distribute the load and also provide additional redundancy.

I've seen a presentation for GridGain and I was very impressed with it.

Know of any others?

+14  A: 

There are several:

Now I haven't used all of these but I've used or investigated the majority of them.

GridGain and GigaSpaces are more centred around grid computing than caching and (imho) best suited to compute grids than data grids (see this explanation of compute vs data grids). I find GigaSpaces to be a really interesting technology and it has several licensing options, including a free version and a free full version for startups.

Coherence and Terracotta try to treat caches as Maps, which is a fairly natural abstraction. I've used Coherence a lot and it's an excellent high-performance product but not cheap. Terracotta I'm less familiar with. The documentation for Coherence I find a bit lacking at times but it really is a powerful product.

OSCache I've primarily used as a means of reducing memory usage and fragmentation in Java Web applications as it has a fairly neat JSP tag. If you've ever looked at compiled JSPs, you'll see they do a lot of String concatenations. This tag allows you to effectively cache the results of a segment of JSP code and HTML into a single String, which can hugely improve performance in some cases.

EHCache is an easy caching solution that I've also used in Web applications. Never as a distributed cache though but it can do that. I tend to view it as a quick and dirty solution but that's perhaps my bias.

memcached is particularly prevelent in the PHP world (and used by such sites as Facebook). It's a really light and easy solution and has the advantage that it doesn't run in the same process and you'll have arguably better interoperability options with other technology stacks, if this is important to you.

cletus
Sorry for the mark down, but your answer is more a comprehensive overview of distributed caching frameworks rather than information on how to enable / setup clustering for Java applications :-)
Karl
Can you query against any of these (Without going to a full blown ORM framework)?
Grasper
Not sure I understand the question. ORM and caching have a little in common but are mostly different goals.
cletus
+4  A: 

I think @cletus's summary is pretty good. I did want to mention that Terracotta provides a lot more than just a distributed cache in the form of a map. It clusters Java heap and synchronization primitives, turning a concurrent Java program into a distributed Java program. You can do caching with it (including using distributed versions of open source cache libs) or a bunch of other stuff.

For work distribution, there are some extra libs written on top of Terracotta, in particular the tim-pipes (for messages) and tim-masterworker (for Master-Worker style distribution) are great abstractions on top of Terracotta. This library is on the Terracotta Forge:

This recently added page may add a bit of additional info in comparison to some other potential data technologies:

Alex Miller
A: 

If you want to go a little lower-level, there is JGroups, which provides you with the very basics of clustering java processes.

+3  A: 

You may want to check out Hazelcast also. Hazelcast is an open source transactional, distributed/partitioned implementation of queue, topic, map, set, list, lock and executor service. It is super easy to work with; just add hazelcast.jar into your classpath and start coding. Almost no configuration is required.

If you are interested in executing your Runnable, Callable tasks in a distributed fashion, then please check out Distributed Executor Service documentation at http://code.google.com/docreader/#p=hazelcast

Hazelcast is released under Apache license and enterprise grade support is also available.

+2  A: 

JPPF is also nice.

+2  A: 

Another you can add to the list is Appistry CloudIQ. It is a distributed computing environment. It is available as a free download up to 5 machines. It includes load distribution as well as automatic fail over of work in the case of a hardware failure, among other features.

Brett McCann
+1  A: 

For grid computing, you could also consider Ice Grid or DataSynapse GridServer. These both provide very effective mechanisms for distributing tasks and provide fail over and redundancy.

John Channing
+1  A: 

I think your question has been interpreted in different ways, you ask about a library which you can use to "cluster enable" your application.

While some of the libs named above can help provide specific cluster functionality such as distributed caching, the more conventional way of enabling work load management is through the use of a J2EE container.

By setting up a clustered container instance this allows you to utilise HA features and work load management, clustering is almost transparent at the application level. I say almost because when writing applications that are going to be clustered you have to be careful how you manage state, for example if you implemented some sort of cache you would need to replicate the state of the cache across each machine.

A good starting place would be to download glassfish and try and setup a clustered glassfish instance.

Hope that helps.

Karl

Karl
+1  A: 

Also check Fura

Lukas Grijander
+1  A: 

And also check ProActive

Lukas Grijander