I love hadoop streaming for it's ability to quickly pump out quick and dirty one off map reduce jobs. I also love groovy for making all my carefully coded java accessible to a scripting language. Now I'd like to put the 2 together. I'd like to take a jar with some of my java classes, and utilize these in groovy-based mappers and reducers.
Is there an easy way to do this? seems like this could be a major reduction in devel time for map reduce tasks, especially those that i'm just going to run a few times.
what i'd like is to do something like:
hadoop jar streaming.jar -mapper "groovy -ne 'import a.b.c.Foo; println Foo.doSomething(line)' -reducer "wc -l" -input input -output output -jarstoinclude ~/jarWithJava.jar
any pointers how to do this?