views:

354

answers:

2

Hi all,

I am developing a java based application; its pertinent requirements are listed below

  • Large datasets exist on several machines on network. my program needs to (remotely) execute a java program to process these data sets and fetch the results

  • A user on a windows desktop will need to process datasets (several gigs) on machine A. My program can reside on the user's machine. He will execute my program from his machine and initiate the dataset processing on remote machine(s)

  • Instead of getting the dataset over the network from the remote machine to his machine, he will execute the program on the remote machine and fetch results

  • The user may have open access to the other machines but ftp is the requirement

  • Data should not be brought through network to the user's machine.

  • Users have windows OS

My question(s)

  • How can I perform this kind of remote process execution ? Any ideas?

  • I am looking at hadoop; I am working on Windows XP. I was unable to get hadoop working for a single node cluster; I am unable to find good documentation. I therefore haven't quite tested hadoop. Any comments on if I am on the right track?

  • Any links any of you has found useful for installation of hadoop and trouble shooting?

Thanks in advance for any responses. Do please let me know if I should provide any more/specific details.

-jv

+1  A: 

Java has a RMI API that you could use, assuming that you can have a JAVA VM running on your remote machines. That's the lightest weight solution. The next lightest weight would be straight socket communication. After that you're getting into EJB servers or Web Servers, which is probably overkill.

Jim Barrows
Thanks a lot. Will check into this
I haven't delved deeply into using RMI yet; but I am thinking that each machine on the network could have the application ("server" (remote) code + "client" code) running on its jvm. So each machine is a client and server and this is how they can talk to eachother. Does that make sense?
There really isn't a client and server with RMI, and you could certainly have class A call methods on class B, and then have class B return the favor. Or you could just have class B return the dataThis is Remote Method Invocation, so anything you can do with a method, you can do with RMI.
Jim Barrows
Thanks Jim. Have started working on my RMI based architecture.
Cool.. could you check mark the answer then :)
Jim Barrows
Sure Jim, as soon as figure out some more details (early next week).
A: 

Have a look at how to write web services with Java 6. That allows you to publish a method as a web service with an annotation. A web service client is small and does not require additional software. I found the Idea IntelliJ IDE easy to use, and generated a pure Java 6 client.

Then it essentially boils down to making a "normal" method call, and processing the result.

Keep it simple. Grid software is most likely not what you want.

Thorbjørn Ravn Andersen
Thanks a lot. Will check into this.
Between a web service and a RMI implementation, which might be faster and light weight. I don't mean to ask questions without doing my research first but I thought an expert's opinion can help guide in right direction.
RMI will be faster. That's Java byte code to java Byte code. Web services are typically XML to java byte code. That's I put them last in my list of other things.Web services work well when you have, or can reasonably expect to have non-java clients talking to you. In your case, as I understand it.
Jim Barrows
The speed difference between RMI and web services (if any) will severely drown in the speed of the network. If you don't believe me, measure!
Thorbjørn Ravn Andersen
Thanks a lot. Working on RMI