I've written a MapReduce in MongoDB and would like to use a global variable as a cache to write to/read from. I know it is not possible to have global variables across map function instances - I just want a global variable within each function instance. This type of functionality exists in Hadoop's MapReduce so I was expecting it to be t...
Update: follow-up to MongoDB Get names of all keys in collection.
As pointed out by Kristina, one can use Mongodb 's map/reduce to list the keys in a collection:
db.things.insert( { type : ['dog', 'cat'] } );
db.things.insert( { egg : ['cat'] } );
db.things.insert( { type : [] });
db.things.insert( { hello : [] } );
mr = db.runComm...
Hello guys,
First time Map/Reduce user here, and using MongoDB. I have a lot of page visit data which I'd like to make some sense of by using Map/Reduce. Below is basically what I want to do, but as a total beginner a Map/Reduce, I think this is above my knowledge!
Go through all the pages with visits in the last 30 days, and where ex...
I'm using the Hadoop 0.18.3 version in combination with java 5 and I'm trying to run the WordCount v1.0 example of the http://hadoop.apache.org/common/docs/r0.20.1/mapred_tutorial.html.
But I get the following error
0/06/10 15:28:10 WARN fs.FileSystem: uri=file:///
javax.security.auth.login.LoginException: Login failed: CreateP...
I'm trying to run a hadoop job (version 18.3) on my windows machine but I get the following error:
Caused by: javax.security.auth.login.LoginException: Login failed: CreateProcess: bash -c groups error=2
at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
at org.apache.hadoop.s...
Hi all,
This is one of my first try with Map Reduce on AWS in its Management Console.
Hi have uploaded on AWS S3 my runnable jar developed on Hadoop 0.18, and it works on my local machine.
As described on documentation, I have passed the S3 paths for input and output as argument of the jar: all right, but the problem is the third argume...
There are CouchDB documents that are list elements:
{ "type" : "el", "id" : "1", "content" : "first" }
{ "type" : "el", "id" : "2", "content" : "second" }
{ "type" : "el", "id" : "3", "content" : "third" }
There is one document that defines the list:
{ "type" : "list", "elements" : ["2","1"] , "id" : "abc123" }
As you can see th...
I have hadoop job with tasks that are expected to run for significant length of fime (few minues). However hadoop starts speculative execution too soon. I do not want to turn speculative execution completely off but I want to increase duration of time hadoop waits before considering job for speculative execution. Is there a config option...
Hi there,
I have a generic check that needs to be run on ca. 1000 objects. The check takes about 3 seconds. We have a server with 4 processors (and we also have other multi-processor servers in our network) so we would like to create an exe / dll to do the checking and return the results to the "master".
Does anyone know of a framework...
Hi,
So using the regular MongoDB library in Ruby I have the following query to find average filesize across a set of 5001 documents:
avg = 0
total = collection.count()
Rails.logger.info "#{total} asset creation stats in the system"
collection.find().each {|row| avg += (row["filesize"] * (1/total.to_f)) if row["filesize"]}
...
Hi all.
I need to split my Map Reduce jar file in two jobs in order to get two different output file, one from each reducers of the two jobs.
I mean that the first job has to produce an output file that will be the input for the second job in chain.
I read something about ChainMapper and ChainReducer in hadoop version 0.20 (currently ...
I have a requirement of parsing both Apache access logs and tomcat logs one after another using map reduce. Few fields are being extracted from tomcat log and rest from Apache log.I need to merge /map extracted fields based on the timestamp and export these mapped fields into a traditional relational db ( ex. MySQL ).
I can parse and e...
Can we use a lotusscript function as a document selection routine inside view selection formula ?
Here is my lotus function which determines the selection criteria
Function MyFilter(doc As NotesDocument) as boolean
'very complex filtering function
'........
End Function
and here is the view selection formula that i want to incorpo...
I'm beginning to learn some Hadoop/MapReduce, coming mostly from a PHP background, with a little bit of Java and Python.
But, it seems like most implementations of MapReduce out there are in Java, Ruby, C++ or Python.
I've looked, and it looks like there are some Hadoop/MapReduce in PHP, but the overwhelming body of the literature se...
I'm trying to implement the following graph reduction algorithm in
The graph is an undirected weighted graph
I want to strip away all nodes with only two neighbors
and update the weights
Have a look at the following illustration:
The algorithm shall transform the upper graph into the lower one. Eliminate node 2 and update the weig...
I have a job in Hadoop 0.20 that needs to operate on large files, one at a time. (It's a pre-processing step to get file-oriented data into a cleaner, line-based format more suitable for MapReduce.)
I don't mind how many output files I have, but each Map's output can be in at most one output file, and each output file must be sorted.
...
I'm trying to count the number of unique users per day on my java appengine app. I have decided to use the mapreduce framework (mapreduce.appspot.com) for java appengine to do this calculation offline. I've managed to create a map reduce job that goes through all of my entities which represent a single users session event. I can use a si...
I have a massive amount of input data (that's why I use Hadoop) and there are multiple tasks that can be solved with various MapReduce steps of which the first mapper needs all the data as input.
My goal: Compute these different tasks as fast as possible.
I currently let them run sequentially each reading in all the data. I assume it ...
I'm trying to use Dumbo/Hadoop to calculate TF-IDF for a bunch of small text
files using this example http://dumbotics.com/2009/05/17/tf-idf-revisited/
To improve efficiency, I've packaged the text files into a sequence
file using Stuart Sierra's tool -- http://stuartsierra.com/2008/04/24/a-million-little-files
The sequence file uses m...
Hi everyone,
Can anyone give some illustration on how to write a program for reduce-side join?
The reduce-side join provided by Hadoop is sort-merge join. How can I write a hash-join algorithm for reduce-side join?
Best
...