tags:

views:

43

answers:

1

Hello,
I would like to know what yours Hadoop development environment looks like?
Do you deploy jars to test cluster, or run jars in local mode?
What IDE do you use and what plugins do you use?
How do you deploy completed projects to be run on servers? What are you other recommendations about setting my own Hadoop development/test environment?

+2  A: 

It's extremely common to see people writing java MR jobs in an IDE like Eclipse or IJ. Some even use plugins like Karamasphere's dev tools which are handy. As for testing, the normal process is to unit test business logic as you normally would. You can unit test some of the MR surrounding infrastructure using the MRUnit classes (see Hadoop's contrib). The next step is usually testing in the local job runner, but note there a number of caveats here: the distributed cache doesn't work in local mode, and you're singly threaded (so static variables are accessible in ways they won't be in production). The next step (and most common test environment) is pseudo-distributed mode - all daemons running, but on a single box. This is going to run code in different JVMs with multiple tasks in parallel and will reveal most developer errors.

MR job jars are distributed to the client machine in different ways. Usually custom deployment processes are seen here. Some folks use tools like Capistrano or config management tools like Chef or Puppet to automate this.

My personal development is usually done in Eclipse with Maven. I build jars using Maven's Assembly plugin (packages all dependencies in a single jar for easier deployment, but fatter jars). I regularly test using MRUnit and then pseudo-distributed mode. The local job runner isn't very useful in my experience. Deployment is almost always via a configuration management system. Testing can be automated with a CI server like Hudson.

Hope this helps.

Eric Sammer