views:

65

answers:

1

I'm trying to combine Hadoop, Pig and Cassandra to be able to work on data stored in Cassandra by means of simple Pig queries. Problem is I can't get Pig to create Map/Reduce jobs that actually work with the CassandraStorage.

What I did is I copied the storage-conf.xml file from one of my cluster machines on top of the one in contrib/pig (source distro of Cassandra), and then compiled the stuff into the cassandra_loadfun.jar file.

Next I adapted the example-script.pig to include all the jars:

register /opt/pig/pig-0.7.0-core.jar;
register /tmp/apache-cassandra-0.6.3-src/lib/libthrift-r917130.jar;
REGISTER /tmp/apache-cassandra-0.6.3-src/contrib/pig/build/cassandra_loadfunc.jar;
rows = LOAD 'cassandra://Keyspace1/Standard1' USING org.apache.cassandra.hadoop.pig.CassandraStorage();
cols = FOREACH rows GENERATE flatten($1);
colnames = FOREACH cols GENERATE $0;
namegroups = GROUP colnames BY $0;
namecounts = FOREACH namegroups GENERATE COUNT($1), group;
orderednames = ORDER namecounts BY $0;
topnames = LIMIT orderednames 50;
dump topnames;

So if I'm not mistaken the jars should be bundled into the job that is submitted to hadoop. But when running the job it just throws an exception at me:

2010-08-04 22:11:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching map reduce job.
2010-08-04 22:11:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias topnames
    at org.apache.pig.PigServer.openIterator(PigServer.java:521)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:391)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias topnames
    at org.apache.pig.PigServer.store(PigServer.java:577)
    at org.apache.pig.PigServer.openIterator(PigServer.java:504)
    ... 6 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
    at org.apache.pig.PigServer.store(PigServer.java:569)
    ... 7 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.NoClassDefFoundError: org/apache/thrift/TBase
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
    at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)

Which I don't understand since the thrift library is explicitly listed, and should be bundled, isn't it?

A: 

The exception clearly says that it is not able to find TBase class

java.lang.NoClassDefFoundError: org/apache/thrift/TBase

Explode the bundled jar and check if thrift lib jar actually present at the right location. The thrift jar may have been bundled at the different location.

You can also try to put jars in lib folder of bundled jar. Another option would be add jar to classpath explicitly.

Harsha Hulageri
The classes are all present in the generated jar files, so that's not really the problem.
cdecker
Either there are multiple jars with same class org/apache/thrift/TBase resulting into conflict or jar is not registered properly. That would be the only reason I can think of based on exception
Harsha Hulageri