I have a collection where each document looks like this
{access_key:'xxxxxxxxx', keyword: "banana", count:12, request_hour:"Thu Sep 30 2010 12:00:00 GMT+0000 (UTC)"}
{access_key:'yyyyyyyyy', keyword: "apple", count:25, request_hour:"Thu Sep 30 2010 12:00:00 GMT+0000 (UTC)", }
.....
To achieve this:
SELECT keyword, sum(count) FROM ke...
Hello,
I have a large CSV file containing a list of stores, in which one of the field is ZipCode.
I have a separate MongoDB database called ZipCodes, which stores the latitude and longitude for any given zip code.
In SQL Server, I would execute a stored procedure called InsertStore which would do a look up on the ZipCodes table to get ...
Hi,
I have a M/R function, and I get NaN as a value for some of the results. I dont have any experience with JS. Im escaping JS using Java Drivers.
String map = "function(){" + " emit({"
+ "country: this.info.location.country, "
+ "industry: this.info.industry}, {count : 1}); }";
String reduce = "function(key, ...
I have a MongoDB collection which has a created_at stored in each document. These are stored as a MongoDB date object e.g.
{ "_id" : "4cacda7eed607e095201df00", "created_at" : "Wed Oct 06 2010 21:22:23 GMT+0100 (BST)", text: "something" }
{ "_id" : "4cacdf31ed607e0952031b70", "created_at" : "Wed Oct 06 2010 21:23:42 GMT+0100 (BST)",...
I'm currently delving into CouchDB, and I am puzzled by the distribution of Map-Reduce computations in views. I see a lot of resources mentioning that Map-Reduce is inherently distributed, because you can process one half of your data on server A, the other half on server B, and then reduce both results. One example would be slide 16 of ...
Hi,
I have five map reduce that I am running each separately. I want to pipeline them all together. So, output of one job goes to next job. Currently, I wrote shell script to execute them all. Is there a way to write this in java? Please provide an example.
Thanks
...
Hi all, I have a long history with relational databases, but I'm new to MongoDB and MapReduce, so I'm almost positive I must be doing something wrong. I'll jump right into the question. Sorry if it's long.
I have a database table in MySQL that tracks the number of member profile views for each day. For testing it has 10,000,000 rows.
C...
I need to design an exercise for my students in programming language design, My idea is help them to learn ideas in lisp, ML and other functional languages by force them to implement a mapreduce exercise with hadoop.
Is here any suggestion that help me detail my idea?
...
Hi, since amazon web service need to pay, so just wanna ask ppl who had worked on it before i jump into it, and confirm some knowledge about it.
Question one:
In Amazon auto scaling service, it says can scale up and down instances.
that does this mean?
does it mean changing the type of instance? or can start/stop more/less instance bas...
Hi,
I am doing some text processing using hadoop map-reduce jobs. My job is 99.2% complete and stuck on last map job.
The last few lines of the map output show as below. Last time, when this problem occured, I tried printing out the key values emmited from map and noticed that one of the key is having large number of values associated...
I am trying to use AppEngine-MapReduce. I understand how to perform an operation over all entities of some entity_kind, but what is the easiest way to only operate on entities over a data range when the entity has a date attribute? Is there a simple way to pass parameters to the mapper?
For example, what if I only wanted to delete entit...
Hi All,
I am trying to build a collaborative filtering based Recommendation System as part of an academic project. I think Mahout project has a lot of potential and I want to use it.
I installed, Mahout, hadoop and Java on my ubuntu 10.1. Hadoop and Java have been checked to be working fine together. (Ran the Hadoop word count example ...
I am using Google App Engine mapreduce to analyze some data. I am generating a few counters that I would like to create a simple Google chart from in my done_callback. How do I access the resulting counters from the callback?
#The map method
def count_created_since(entity):
now = datetime.datetime.now()
delta = now-entity.created
...
I have a lot of trivially parallelizable computations and a lot (100s) of cores distributed overs SSH + NFS network.
What is the simplest way of parallelization.
The problem is that I don't know how long each task will take so I need some kind of queue.
Is there something that is very easy to use?
...
Hello
I'm trying to use map/reduce to process large amounts of binary data. The application is characterized by the following: the number of records is potentially large, such that I don't really want to store each record as a separate file in HDFS (I was planning to concatenate them all into a single binary sequence file), and each rec...
I'm using Map Reduce with MongoDB. Simplified scenario: There are users, items and things. Items include any number of things. Each user can rate things. Map reduce is used to calculate the aggregate rating for each user on each item. It's a complex formula using the ratings for each thing in the item and the time of day - it's not ...
I have written a mapreduce application for hadoop and tested it at the command line on a single machine. My application uses two steps Map1 -> Reduce1 -> Map2 -> Reduce2
To run this job on aws mapreduce, I am following this link http://aws.amazon.com/articles/2294. But I am not clear how to use Ruby CLI client provide by amazon to do al...