views:

148

answers:

2

I am currently investigating what Java compatible solutions exist to address my requirements as follows:

  • Timer based / Schedulable tasks to batch process
  • Distributed, and by that providing the ability to scale horizontally
  • Resilience, no SPFs please

The nature of these tasks (heavy XML generation, and the delivery to web based receiving nodes) means running them on a single server using something like Quartz isn't viable.

I have heard of technologies like Hadoop and JavaSpaces which have addressed the scaling and resilience end of the problem effectively. Not knowing whether these are quite suited to my requirements, its hard to know what other technologies might fit well.

I was wondering really what people in this space felt were options available, and how each plays its strengths, or suits certain problems better than others.

NB: Its worth noting that schedule-ability is perhaps a hangover from how we do things presently. Yes there are tasks which ought to go at certain times. It has also been used to throttle throughput at times when no mandate for set times exists.

+2  A: 

Asynchronous always brings JMS to mind for me. Send the request message to a queue; a MessageListener is plucked out of the pool to handle it.

This can scale, because the queue and listener can be on a remote server. The size of the listener thread pool can be configured. You can have different listeners for different tasks.

UPDATE: You can avoid having a single point of failure by clustering and load balancing.

You can get JMS without cost using ActiveMQ (open source), JBOSS (open source version available), or any Java EE app server, so budget isn't a consideration.

And no lock-in, because you're using JMS, besides the fact that you're using Java.

I'd recommend doing it with Spring message driven POJOs. The community edition is open source, of course.

If that doesn't do it for you, have a look at Spring Batch and Spring Integration. Both of those might be useful, and the community editions are open source.

duffymo
I've worked with JMS in the past and whilst it suits async well, I hadn't used any that were not prone to becoming Single points of failure. Sure, i don't doubt that commercial vendors provide heavy-metal version of distributable, resilient MQ clusters.. Dont have that kind of budget, nor an interest in vendor lock-in.I kinda want to stay away from big commercial frameworks, open source preferred.
j pimmel
ActiveMQ and RabbitMQ are two popular open-source queues which support clustering, so that you are not hosed if the server hosting the queue goes down. http://www.rabbitmq.com/ and http://activemq.apache.org/. Both have Java client APIs.
matt b
+1  A: 

Have you looked into GridGain? I am pretty sure it won't solve the scheduling problem, but you can scale it and it happens like "magic", the code to be executed is sent to a node and it is executed in there. It works fine when you don't have a database connection to be sent (or anything that is not serializable).

Ravi Wallau