ansaurus

Question

hadoop, map/reduce output file(part-00000) and distributed cache

Answer 1

A:

My suggestion:

Create a new Hadoop job with the following properties:

Input the directory with all the part-... files.
Create a custom OutputFormat class that writes to your distributed cache.

Now make your job to look essentially to have something like this:

conf.setInputFormat(SequenceFileInputFormat.class);
conf.setMapperClass(IdentityMapper.class);
conf.setReducerClass(IdentityReducer.class);
conf.setOutputFormat(DistributedCacheOutputFormat.class);

Have a look at the Yahoo Hadoop tutorial because it has some examples on this point: http://developer.yahoo.com/hadoop/tutorial/module5.html#outputformat

HTH

Niels Basjes 2010-07-08 08:31:15

thank you for the explaination, but i need more elaboration on custom output format to write to distributed cache

ayush singhal 2010-07-09 06:21:06

I assume your distributed caching software allows you to write a client to put values in it. Now take the example from Yahoo and fill the "void write(K key, V value)" method with calls to the API of your distributed caching software.

Niels Basjes 2010-07-09 13:55:47

ansaurus

tags:

views:

answers:

hadoop, map/reduce output file(part-00000) and distributed cache

related questions