tags:

views:

42

answers:

1

hi friends,

the value ouput from my map/reduce is a bytewritable array, which is written in the output file part-00000 (hadoop do so by default). i need this array for my next map function so i wanted to keep this array in distributed cache. can sombody tell how can i read from outputfile (part-00000) which may not be a text file and store in distributed cache.

A: 

My suggestion:

Create a new Hadoop job with the following properties:

  • Input the directory with all the part-... files.
  • Create a custom OutputFormat class that writes to your distributed cache.
  • Now make your job to look essentially to have something like this:

    conf.setInputFormat(SequenceFileInputFormat.class);
    conf.setMapperClass(IdentityMapper.class);
    conf.setReducerClass(IdentityReducer.class);
    conf.setOutputFormat(DistributedCacheOutputFormat.class);
    

Have a look at the Yahoo Hadoop tutorial because it has some examples on this point: http://developer.yahoo.com/hadoop/tutorial/module5.html#outputformat

HTH

Niels Basjes
thank you for the explaination, but i need more elaboration on custom output format to write to distributed cache
ayush singhal
I assume your distributed caching software allows you to write a client to put values in it. Now take the example from Yahoo and fill the "void write(K key, V value)" method with calls to the API of your distributed caching software.
Niels Basjes