Hi,
I have some experience with Lucene, I'm trying to understand how the data is actually stored in slave server in Hadoop framework?
Do we create an index in Slave Server with set of attributes to describe Document we are storing? how does it works in reality ?
Thanks
R
...
I have an Oracle database (roughly 1.2 billion records) of data with a web application sitting on top of it that generates queries (generates SQL code and returns counts). Basically you generated SQL queries graphically through an AJAX UI...and it runs pretty nice performance-wise.
This is roughly a 400 GB database. I've been looking at...
Grep seems not to be working for hadoop streaming
For:
hadoop jar /usr/local/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-streaming.jar -input /user/root/tmp2/user.data -output /user/root/selected_data -mapper '/bin/grep 1938678460' -reducer 'wc' -jobconf mapred.output.compress=false
I get:
java.lang.RuntimeException: PipeMapRed.wait...
If I understand the Hadoop ecosystem correctly, I can run my MapReduce jobs sourcing data from either HDFS or HBase. Assuming the previous assumption is correct, why would I choose one over the other? Is there a benefit of performance, reliability, cost, or ease of use to using HBase as a MR source?
The best I've been able to find is th...
I have run into a complex problem with Mapreduce. I am trying to match up 2 unique values that are not always present together in the same line. Once I map those out, I need to count the total number of unique events for that mapping.
The log files I am crunching are 100GB+ uncompressed and has data broken into 2 parts that I need to ...
I am using Avro 1.4.0 to read some data out of S3 via the Python avro bindings and the boto S3 library. When I open an avro.datafile.DataFileReader on the file like objects returned by boto it immediately fails when it tries to seek(). For now I am working around this by reading the S3 objects into temporary files.
I would like to be a...
I have set up Hadoop on my laptop and ran the example program given in the installation guide successfully. But, I am not able to run a program.
rohit@renaissance1:~/hadoop/ch2$ hadoop MaxTemperature input/ncdc/sample.txt output
Exception in thread "main" java.lang.NoClassDefFoundError: MaxTemperature
Caused by: java.lang.ClassNotFoun...
I have a Pig script--currently running in local mode--that processes a huge file containing a list of categories:
/root/level1/level2/level3
/root/level1/level2/level3/level4
...
I need to insert each of these into an existing database by calling a stored procedure. Because I'm new to Pig and the UDF interface is a little daunting, I'...
I just started with Hadoop. I wrote a sample hadoop code as was written in the book. But still, during the time of execution exceptions arise. The snippet of what I get :
[harsh@geek hadoop-0.20.2]$ hadoop MaxTemperature input/ncdc/sample.txt output
Exception in thread "main" java.lang.NoClassDefFoundError: MaxTemperature
Caused by: jav...
I have inherited a mapreduce codebase which mainly calculates the number of unique user IDs seen over time for different ads. To me it doesn't look like it is being done very efficiently, and I would like to know if anyone has any tips or suggestions on how to do this kind of calculation as efficiently as possible in mapreduce.
We use H...
I have a four machine Hadoop cluster setup that I've verified works correctly using the bundled WordCount example running locally from the NameNode machine. I'm now starting to write my own MapReduce classes in Java which I've bundled into a JAR with the necessary driver class that extends Configured and implements Tool.
I'm trying to r...
Hi,
I've been following Hadoop for a while, it seems like a great technology. The Map/Reduce, Clustering it's just good stuff. But I haven't found any article regarding the use of Hadoop with SQL Server.
Let's say I have a huge claims table (600 million rows) and I want to take advantage of Hadoop. I was thinking but correct me if I'm ...
Hi,
I am having a very large string, and when I read it in Java, I am getting out of memory error. Actually, I need to read all this string into memory and then split into individual strings and sort them based on value. What is the best way do this?
Thanks
...
Hi
Text manipulation in Reduce phase seems not working correctly.
I suspect problem could be in my code rather then hadoop itself but you never know...
If you can spot any gotchas let me know.
I wasted a day trying to figure out what’s wrong with this code.
my sample input file called simple.psv
12345 [email protected]|m|1975
12346 bbc@...
We have a box that has terabytes of data (10-20TB) each day, where each file on the drive is anywhere from megabytes to gigabytes.
We want to send all these files to a set of 'pizza boxes', where they will consume and process the files.
I can't seem to find anything that is built to handle this amount of data besides distcp (hadoop). R...
Hi folks,
I've just finished installing Hadoop 0.20.2 under Cygwin on Windows 7 with Eclipse Helios (3.6). Hadoop is now fully started, and I'm trying to run a test application within a newly created MapReduce test project in Eclipse. I'm using the Hadoop 0.20.2 plugin from the Hadoop download.
The Map/Reduce Location perspective opera...
In the Hadoop API documentation it's given
that
setJarByClass
public void setJarByClass(Class cls)
Set the Jar by finding where a given class came from.
What exactly does this explanation mean? does it creates a JAR file from the class file argument specified in the method above? and does that jar file is executed for the MapRe...
Hi,
I have five map reduce that I am running each separately. I want to pipeline them all together. So, output of one job goes to next job. Currently, I wrote shell script to execute them all. Is there a way to write this in java? Please provide an example.
Thanks
...
I need to design an exercise for my students in programming language design, My idea is help them to learn ideas in lisp, ML and other functional languages by force them to implement a mapreduce exercise with hadoop.
Is here any suggestion that help me detail my idea?
...
Hadoop-0.20.2 Single Node Setup FAIL!!!! The jobtracker and namenode do not start :(
Any suggestions would be welcome.
As far as i know, i have set core-site.xml, hdfs-site.xml and mapred-site.xml correctly
...