I'm interested in learning techniques for distributed computing. As a Java developer, I'm probably willing to start with Hadoop. Could you please recommend some books/tutorials/articles to begin with?
...
Hi,
In the "syslog" for a MapReduce job flow step, I see the following:
Job Counters
Launched reduce tasks=4
Launched map tasks=39
Does the number of launched map tasks include failed tasks?
I am using NLineInputFormat class as input format to manage the number of map tasks.
However, I get slightly different numbers for exact sa...
I launched a hadoop cluster and submitted a job to the master. The jar file is only contained in the master. Does hadoop ship the jar to all the slave machines at the start of the job? Is there a possibility that slave machine will run with previous version of code shipped during last run?
Thank you
Bala
...
Just finished reading ch23 in the excellent 'Beautiful Code' http://oreilly.com/catalog/9780596510046
on Distributed Programming with MapReduce. I understand that MapReduce is a programming system designed for large-scale data processing problems, but I have a hard time getting my head around the basic examples given and how I might app...
Hi,
This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of the form <k,v> where k is the offset of the line from the beginning and value is the content of the line.
Now, when we say that we want to run N map tasks, doe...
Hi,
I tried printing out values using System.out.println(), but they won't appear on the console. How do i print out the values in a map/reduce application for debugging purposes using Hadoop?
Thanks,
Deepak.
...
Hi,
Can someone walk me though the basic work-flow of reading and writing data with classes generated from DDL?
I have defined some struct-like records using DDL. For example:
class Customer {
ustring FirstName;
ustring LastName;
ustring CardNo;
long LastPurchase;
}
I've compiled this to get a Customer class ...
Do you know of any python mapreduce ready clustering libraries?
I have found some good libraries in Java (http://lucene.apache.org/mahout/), I'd prefer to use python though.
http://wiki.github.com/klbostee/dumbo/ (Python mapreduce API )
Edit ---
I'm looking for mapreduce ready : Canopy, K-means, Means-shift,etc..
...
What can I do with Mapreduce? Dictionaries? Lists? What do I use it for? Give a real easy example
...
hey all, just getting started on hadoop and curious what the best way in mapreduce would be to count unique visitors if your logfiles looked like this...
DATE siteID action username
05-05-2010 siteA pageview jim
05-05-2010 siteB pageview tom
05-05-2010 siteA pageview jim
05-05-2010 siteB pageview bob
05-05-2010 siteA ...
Hi,
My program follows a iterative map/reduce approach. And it needs to stop if certain conditions are met. Is there anyway i can set a global variable that can be distributed across all map/reduce tasks and check if the global variable reaches the condition for completion.
Something like this.
While(Condition != true){
C...
The output from MongoDB's map/reduce includes something like 'counts': {'input': I, 'emit': E, 'output': O}. I thought I clearly understand what those mean, until I hit a weird case which I can't explain.
According to my understanding, counts.input is the number of rows that match the condition (as specified in query). If so, how is it ...
I'm a math guy and occasionally do some statistics/machine learning analysis consulting projects on the side. The data I have access to are usually on the smaller side, at most a couple hundred of megabytes (and almost always far less), but I want to learn more about handling and analyzing data on the gigabyte/terabyte scale. What do I n...
Hi,
I have a code fragment in which I am using a static code block to initialize a variable.
public static class JoinMap extends
Mapper<IntWritable, MbrWritable, LongWritable, IntWritable> {
.......
public static RTree rt = null;
static {
String rtreeFileName = "R.rtree";
rt...
All of the MongoDB MapReduce examples I have seen have dealt with counting/adding numbers. I need to combine strings, and it looks like MapReduce is the best tool for the job. I have a large MongoDB collection in this format:
{name: userone, type: typeone}
{name: usertwo, type: typetwo}
{name: userthree, type: typeone}
Each name only ...
Hi,
I am having a database with tables having billions of rows in a single table for a month and I am having data for the past 5 years. I tried to optimize the data in all possible ways, but the latency is not decreasing. I know there are some solutions like using horizantal shrading and vertical shrading. But I am not sure about any op...
Except for Amazon MapReduce, what other options do I have to process a large amount of data?
Thank you!
...
Hi,
I want to implement Fast Fourier Transform algorithm with Hadoop. I know recursive-fft algorithm but I need your guideline in order to implement it Map/Reduce approach. Any suggestions?
Thanks.
...
Hi all!
i have algorithm that will go through a large data set read some text files and search for specific terms in those lines. I have it implemented in Java, but I didnt want to post code so that it doesnt look i am searching for someone to implement it for me, but it is true i really need a lot of help!!! This was not planned for my...
I have started a maven project trying to implement the MapReduce algorithm in java 1.5.0_14. I have chosen the 0.20.2 API hadoop version. In the pom.xml i'm using thus the following dependency:
< dependency>
< groupId>org.apache.hadoop< /groupId>
< artifactId>hadoop-core< /artifactId>
< version>0.20.2< /version>
< /depend...