Hi all,
I am developing a java based application; its pertinent requirements are listed below
Large datasets exist on several machines on network. my program needs to (remotely) execute a java program to process these data sets and fetch the results
A user on a windows desktop will need to process datasets (several gigs) on machine A. My program can reside on the user's machine. He will execute my program from his machine and initiate the dataset processing on remote machine(s)
Instead of getting the dataset over the network from the remote machine to his machine, he will execute the program on the remote machine and fetch results
The user may have open access to the other machines but ftp is the requirement
Data should not be brought through network to the user's machine.
Users have windows OS
My question(s)
How can I perform this kind of remote process execution ? Any ideas?
I am looking at hadoop; I am working on Windows XP. I was unable to get hadoop working for a single node cluster; I am unable to find good documentation. I therefore haven't quite tested hadoop. Any comments on if I am on the right track?
Any links any of you has found useful for installation of hadoop and trouble shooting?
Thanks in advance for any responses. Do please let me know if I should provide any more/specific details.
-jv