Hi, I am running Hadoop 0.20.1 under SLES 10 (SUSE).
My Map task takes a file and generates a few more, I then generate my results from these files. I would like to know where I should place these files, so that performance is good and there are no collisions. If Hadoop can delete the directory automatically - that would be nice.
Right now, I am using the temp folder and task id, to create a unique folder, and then working within subfolders of that folder.
reduceTaskId = job.get("mapred.task.id");
reduceTempDir = job.get("mapred.temp.dir");
String myTemporaryFoldername = reduceTempDir+File.separator+reduceTaskId+ File.separator;
File diseaseParent = new File(myTemporaryFoldername+File.separator +REDUCE_WORK_FOLDER);
The problem with this approach is that I am not sure it is optimal, also I have to delete each new folder or I start to run out of space. Thanks akintayo
(edit) I found that the best place to keep files that you don't want beyond the life of map would be job.get("job.local.dir") which provides a path that will be deleted when the map tasks finishes. I am not sure if the delete is done on a per key basis or for each tasktracker.