views:

24

answers:

1

Hi

I am using hadoop and working with a map task that creates files that I want to keep, currently I am passing these files through the collector to the reduce task. The reduce task then passes these files on to its collector, this allows me to retain the files.

My question is how do I reliably and efficiently keep the files created by map?

I know I can turn off the automatic deletion of map's output, but that is frowned upon are they any better approaches?

Thanks

A: 

You could split it up into two jobs.

First create a map only job outputting the sequence files you want.

Then, taking your existing job (doing really nothing in the map anymore but you could do some crunching depending on your implementation & use cases) and reducing as you do now inputting the previous map only job through as your input to the second job.

You can wrap this all up in one jar running the 2 jars as such passing the output path as an argument to the second jobs input path.

Joe Stein
Thanks, but I needed to use the files within the map. For example, I create an image and then extract certain features from the image. I decided to have each tasktracker create a sequence file, and have the map function retrieve a static reference to the sequence file.
akintayo