tags:

views:

57

answers:

2

Hello guys,

I need help designing a system that will be used for analyzing and visualizing some 3D seismic tomography. I'v just graduated, I don't have a lot of experience and this task just landed on me.

Here is the setup:

  1. 3 Super Computing Clusters. Analysis is currently controlled by a bash script that coordinates 5 executable (Each being a stage in the analysis algorithm). The output of one executable is the input to another. The files are several hundred MB and can take a few hours to produce.
  2. Someone recommended we write a "Control Server" to submit jobs from the clients to the clusters.
  3. The clients need to be able to display 3D visualizations when the output from each stage in the analysis (The executable above) is available.

I thought of using a a Java Servlet as the Control Server (CS) to accept client requests for jobs. The clients would send a small input file and some parameters to the CS, the CS would assign the job an ID.

The CS would then somehow start the process on the clusters (RMI? HTTP? Another Servlet? Some Custom Server Listening on a Socket?). When each stage of the computation on the clusters is done it will some how notify the CS that a specific file is ready. (Again, what technology is appropriate for contacting CS?)

Then when a client requests a specific output file for visualization it will ask the CS for the address of the file on the clusters to download it directly from the cluster.

I was thinking of using a Java Applet for the client so I could use the OpenGL API for the 3D visualization.

I'm pretty lost and the solution above sounds pretty clumsy to me. What technologies would be better suited? Should I use http to transfer the data? Should I code my own servers with the socket API and use the Java IO streams? How do the clusters communicate back with the CS that a file is ready after a few hours? Do I even want to use Java Servlets?

+1  A: 

For the cluster infrastructure you're probably better off using Python or another scripting language than Java. They do a better job of this type of glue code, and they come with networking libraries for this sort of work.

Also, this is a very well-trodden path - there might be an off-the-shelf framework that you can adapt for this without having to roll your own. This is the sort of thing that MPI, PVM and Beowulf (to name a few of the better known toolkits of this type) were designed for. There is a rich body of mature taxpayer-funded cluster computing infrastructure already available. Before you go rolling your own, take a look at what you can get off the shelf.

ConcernedOfTunbridgeWells
A: 

Are the clients on the same network as the Control Server?

If yes, Servlets sound like overkill to me. You just need simple multi-threaded server on the CS and a single-threaded server on the cluster.

If the clients are Java based, you just need simple Java networking classes to do the task.

Do the clients have to be browser-based? If not making a normal Java application should be easier than making an applet.

mdm