views:

24

answers:

1

Hi everyone,

I'm writing a simple program for enumerating triangles in directed graphs for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab symbol serves as a delimiter) I want my map function output the following pairs ([a, to_b], [b, from_a], [a_b, -1]):

 public void map(LongWritable key, Text value,
                OutputCollector<Text, Text> output,
                Reporter reporter) throws IOException {

  String line = value.toString();
  String [] tokens = line.split("    ");

  output.collect(new Text(tokens[0]), new Text("to_"+tokens[1]));
  output.collect(new Text(tokens[1]), new Text("from_"+tokens[0]));
  output.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1"));

}

Now my reduce function is supposed to cross join all pairs that have both to_'s and from_'s and to simply emit any other pairs whose keys contain "_".

      public void reduce(Text key, Iterator<Text> values,
                   OutputCollector<Text, Text> output,
                   Reporter reporter) throws IOException {

  String key_s = key.toString();

  if (key_s.indexOf("_")>0)
      output.collect(key, new Text("completed"));

   else {

           HashMap <String, ArrayList<String>> lists = new HashMap <String, ArrayList<String>> ();    

          while (values.hasNext()) {

              String line = values.next().toString();

              String[] tokens = line.split("_");
              if (!lists.containsKey(tokens[0])) {
                   lists.put(tokens[0], new ArrayList<String>());
              }
           lists.get(tokens[0]).add(tokens[1]);     
          }

          for (String t : lists.get("to"))
               for (String f : lists.get("from"))
                  output.collect(new Text(t+"_"+f), key); 


  }

} 

And this is where the most exciting stuff happens. tokens[1] yields an ArrayOutOfBounds exception. If you scroll up, you can see that by this point the iterator should give values like "to_a", "from_b", "to_b", etc... when I just output these values, everything looks ok and I have "to_a", "from_b". But split() don't work at all, moreover line.length() is always 1 and indexOf("") returns -1! The very same indexOf WORKS PERFECTLY for keys... where we have pairs whose keys contain "" and look like "a_b", "b_c"

I'm really puzzled with all this. MapReduce is supposed to save lives making everything simple. Instead I spent several hours to just localize this...

I'd really appreciate your help, guys!!! Thanks in advance!

A: 

NOt sure if that's the problem by try changing this:

  String [] tokens = line.split("    ");

to this:

  String [] tokens = line.split("\t");
Alex N.
thanks, just tried it... Unfortunately, it didn't resolve my problem((( But the line looks more professional now!)
Krovatkin
ok I am bit confused is this job failing in mapper or in reducer?
Alex N.
in a reducer... for example, if I strip off everything from my reducer and make it to just propagate up each value it gets from a mapper, it works perfect -- for "a" as a key it outputs two pairs "a, to_b" and "a, from_c." However, if I want it to split each "to_something" and "from_something", I get this weird situation, where indexOf("_") returns -1 for "to_b"
Krovatkin