views:

54

answers:

1

Hi!

I keep getting Exceeded MAX_FAILED_UNIQUE_FETCHES; on the reduce phase even though I tried all the solutions I could find online. Please help me, I have a project presentation in three hours and my solution doesn't scale.

I have one master that is NameNode and JobTracker (172.16.8.3) and 3 workers (172.16.8.{11, 12, 13})

Here are the corresponding configuration files (I can't make the construction to show):


//////// 172.16.8.3 ////////////////////

// core-site.xml

    
        hadoop.tmp.dir
        /usr/local/hadoop-datastore/hadoop-${user.name}
        Hadoop Data Store
    

    
        fs.default.name
        hdfs://172.16.8.3:54310/ 
    


// mapred-site.xml


    
        mapred.job.tracker
        172.16.8.3:54311
    




//////// 172.16.8.11 ////////////////

// core-site.xml

    
        hadoop.tmp.dir
        /usr/local/hadoop-datastore/hadoop-${user.name}
        Hadoop Data Store
    

    
        fs.default.name
        hdfs://172.16.8.3:54310/ 
    


// mapred-site.xml


    
        mapred.job.tracker
        172.16.8.3:54311
    


/////// 172.16.8.12 //////////////
// core-site.xml
< configuration >
    
        hadoop.tmp.dir
        /usr/local/hadoop-datastore/hadoop-${user.name}
        Hadoop Data Store
    

    
        fs.default.name
        hdfs://172.16.8.3:54310/ 
    
< /configuration >

// mapred-site.xml

    
        mapred.job.tracker
        172.16.8.3:54311
    


///////// 172.16.8.13 ////////
// core-site.xml

    
        hadoop.tmp.dir
        /usr/local/hadoop-datastore/hadoop-${user.name}
        Hadoop Data Store
    

    
        fs.default.name
        hdfs://172.16.8.3:54310/ 
    



// mapred-site.xml

    
        mapred.job.tracker
        172.16.8.3:54311
    


A: 

This link seems to indicate that it could be a network configuration problem.

The reducers are failing to fetch the maps (due to failing too many attempts) which could be due to a networking/configuration problem preventing them being able to query the Jetty server.

Disclaimer: I have not used Hadoop professionally

Syntax
I already checked all the networking and it doesn't seem to be anything wrong. I also checked the problem when you run out of file descriptors. And I still can't pass this bug. PLEASE HELP! I have to present my thesis in less than three hours.
Marius
I just discovered that running the wordcount example on the cluster with more than one input file gives the same error. What is the connection?
Marius
How many nodes in your cluster? Is it possible that somehow you are registering an extra node and when more than one file is submitted the second is given to the additional node? The wordcount example indicates to me that lines are given to Mappers; are there multiple lines in each text file? Meaning that the content of one file would be distributed for processing across multiple hosts?
Syntax
in the wordcount example I give the sistem some huge texts, with thousands of lines. If the input folder contains just one file everything works out fine, but if a put a copy of the previous text in the same folder and rerun it stops working in the reduce phase with the Exceeded MAX_FAILED_UNIQUE_FETCHES; errorMy sistem has 4 nodes: 1 master and 3 slaves. My project takes advantage of this kind of processing (lines run on different machines)
Marius
Are you able to test/verify if it works with multiple files with only the master? Then start re-adding the slaves. This may help to identify if/which slave may be causing problems (if it is networking/comms).
Syntax
I have already tested each node in part at installation. I tried taking out all the nodes and introducing them after but I keep getting some errors about replicas (even thoung in hdfs..xml the number of replicas is 1)
Marius