views:

1454

answers:

6

We are currently running a Java integration application on a Linux box. First an overview of the application.

The Java application is a standalone application (not deployed on any JEE application server like OracleAS,WebLogic,JBOSS etc). By Stand Alone I mean its NOT a DESKTOP application. However it is run from the command line from a Main class. The user does not directly interact with this application at all. Messages are dumped into the queue using an API which is then read out by my Application which is constantly running 24/7. I wouldn't qualify this as a desktop app since the user has no direct interaction with it.(Not sure if this is the correct reasoning to qualify as one).

It uses Spring and connects to WebSphere MQ and Oracle Database We use a Spring Listener(Spring Message Driven POJOs) which listens to a queue on WebSphere MQ. Once there is a message in the queue, the application read the message from the MQ and dumps(insert/update) it into the database.

Now the question is:

  1. How can we horizontally scale this application? I mean just putting more boxes and running multiple instances of this same application, is that a viable approach?
  2. Should we consider moving from Spring MDPs to EJB MDBs? Thereby deploying it on the Application Server. Is there any added benefit by doing so?
  3. There is a request to make the application High Available(HA)? What are the suggested methodologies or strategies that can be put in place to make a standalone application HA?
A: 

Does "standalone" == "desktop"?

How do users interact with the controller that owns the message-driven beans?

My opinions on your questions:

  1. You can scale by adding more message listeners to the listener pool, since each one runs in its own thread. You should match the size of the database connection pool to message listeners, so that would have to increase as well. Do that before adding more servers. Make sure you have enough RAM on hand.
  2. I don't see what EJB MDB buys you over Spring MDB. You keep referring to "app servers". Do you specifically mean Java EE app servers like WebLogic, WebSphere, JBOSS, Glassfish? Because if you're deploying Spring on Tomcat I'd consider Tomcat to be the "app server" in this conversation.
  3. HA means load balancing and failover. You'll need to have databases that are either synchronized or hot redeployable. Same with queues. F5 is a great hardware solution for load balancing. I'd talk to your infrastructure folks if you have some.
duffymo
Standalone == NOT DESKTOP (Main class run from the command line)1)By adding listeners, I believe you are saying to increase the consumers(of Spring) in the same JVM and this will have a limit. Hence I belive it will NOT horizontally scale. However adding more listners in other instances is OK
Franklin
2)By AppServer I mean OracleAS,WebLogic,JBOSS etc. So is that a better choice? Well having EJBs gives you the advantage that as and when the AppServers are scaled, so would your application. Advantage of easy maintenance,monitoring and notification of course with the overhead of EJBs.
Franklin
Running a main class from a command line IS desktop. Of course adding listeners means a limit imposed by RAM.
duffymo
Hi Duffymo, Not really sure HA means exactly load balancing.Load balancing would be a means to distribute the processing across multiple nodes. However HA would mean NO loss of Business Service even if one of the Nodes went dont. Not sure if they necessarily mean the same. Do correct if wrong.
Franklin
@Franklin - You're correct, load balancing is not synonymous with HA when all is well and the cluster is operating as designed. But if one side goes down, and the load is seamlessly routed to the machine that's still available, I'm thinking that it has some of the character of HA at that time.
duffymo
A: 

Horizontal scaling for any application will eventually run into limits as demand for the data increases. Those limits are determined by load and server/database performance. At some point, if demand and load increase with scaling, the number of servers/databases will have to increase as well. Depending on the data that is being stored, the servers/databases will either have to be duplicated and synchronized, or some sort of hashing algorithm will need to be employed to split data across multiple servers. As you increase the number of synchronized data sources the cost of replicating/synchronizing those servers increases as well. That is why the hashed approach may be more appealing to minimize cost.

True High Availability solutions are very expensive to implement. I've seen various degrees of HA as well, but by definition it means absolute minimal or no downtime of, or lose of access to the data source. To achieve this requires a lot of redundant hardware, networking, and software that is able to utilize redundant hardware without losing the ability to get to the data when one of the data sources fails. Hardware failure is inevitable, it will happen, as well as power outages and other random acts of nature. Depending on how critical this data is an HA solution will also require multiple data centers on multiple independent power grids. Which is obviously going to be very expensive, so it all depends on how critical this data is to the end-user.

So, HA is an extreme scenario requiring an expensive architecture. I find that most of the time people are interested in just minimizing downtime, and depending on the size of the data source this can be achieved fairly inexpensively with adding hot-spares of the data sources.

Evan
+3  A: 

Another option is Terracotta, a framework that does precisely what you want; running your app on several machines simultaneously and balancing the load among them.

Chochos
Actually Terracotta, as far as I understand, is a clustering framework and would also intrude into my code. However I have no state to share and hence I really do not know the benefit or adding Terracota. But i need to look into its Load Balancing facility if it provides so.Thanks,Franklin.
Franklin
Do correct me if I am wrong about the code intrusion.
Franklin
Terracotta does not "intrude" into your code. Curious why you would say that?
Taylor Gautier
Well am not very sure of the intrusiveness. However I believe its not 100% non intrusive. Is it so? If not I will definitely look into it.
Franklin
Terracotta works by instrumenting the bytecode of your application as its loaded into the JVM. This all done through an XML configuration file. More info here:http://www.terracotta.org/web/display/docs/Concept+and+Architecture+Guide#ConceptandArchitectureGuide-BytecodeInstrumentation
cliff.meyers
@brd644 is right. You can also use annotations, if you like - your choice. I recommend you read through the cookbook - it only takes a few minutes and it will give you a good feel for how Terracotta looks in your code - which is to say it doesn't :) http://tinyurl.com/bjzmh5
Taylor Gautier
+1  A: 
  1. Horizontal scaling a message driven app is easy... most of the time. You can certainly add another message listener operating on the same queue. Watch out, though, because you might have subtle dependencies on the ordering of messages. They might not be a problem now, with just one processor, but with more than one you are guaranteed that the messages will be processed "out of order" at some point.
  2. EJB MDPs don't offer anything beyond Spring MDBs. Stick with what's working.
  3. Horizontally scaling the processors is a start, but this one requires a bit more discussion.

For HA, you need to clarify the requirements. "High availability" is an interesting question for a queue-based app. If your app goes down for a few minutes, messages pile up in the queue. As long as you can get your app back up and running, those messages will still get processed, just with a bit more latency. It's probably worth asking, "What is the maximum acceptable latency for a message?"

There's probably some component of concern about hardware failures, loss of a datacenter, etc. These won't be addressed by horizontal scaling in the same location. You'll need to replicate all components at every layer: the queue itself, the processors, the backend database, and all network hardware connecting them.

It's an expensive proposition, so it's also worth asking, "What's the delta in annualized loss expectancy of downtime between an HA scenario and a non-HA scenario?" ALE incorporates both direct losses and regulatory or legal costs, so it's a good way to capture the cost of downtime.

mtnygard
A: 

.1. Creating more listeners on the queue can scale the number of consumers. As a consumer dies, the remaining consumers can keep running. Note: Your MQ and database need to have high availability solutions as well.

.2. Not sure what difference an application server would make in your case. Perhaps you could explain which features you intend to use?

.3. See my answer to 1. for HA.

Peter Lawrey
A: 

Did you try to make multiple boxes ? I think you may see the doc of your MQ ? running multiple boxes may need some configuartion in your MQ but it will run ISA