mapreduce

Does Mongoid have Map/Reduce?

I am using Ruby code to calculate sum from the array returned by Mongoid. But maybe using Map/Reduce can be faster, except I don't see any docs for Map Reduce on mongoid.org and Google for map reduce site:mongoid.org doesn't give any result either. (or using MapReduce or Map/Reduce) There are docs on MongoDB's site map reduce site...

can HBase , MapReduce and HDFS can work on a single machine having Hdoop installed and running on it ?

I am working on a search engine design, which is to be run on cloud. We have just started, and have not much idea about Hdoop. Can anyone tell if HBase , MapReduce and HDFS can work on a single machine having Hdoop installed and running on it ? ...

Does MongoDB's Map/Reduce always return results in floats?

I am using Mongoid, which is on top of the Ruby MongDB driver. Even though my Map's emit is giving out a parseInt(num), and the Reduce's return is giving back also a parseInt(num), the final results still are floats. Is that particular to MongoDB? Any way to make it integer instead? ...

CouchDB Directed Acyclic Graph (DAG)

If my structure looks like this: [{Name: 'A', Depends: []}, {Name: 'B', Depends: ['A']}, {Name: 'C', Depends: ['A']}, {Name: 'D', Depends: ['C']}, {Name: 'E', Depends: ['D','B']}] How would I write the map and reduce functions such that my output is: [{Key: 'A', Value: []}, {Key: 'B', Value: ['A']}, {Key: 'C', Value: ['A']}, {Key: 'D...

When do I need map reduce for database queries?

In CouchDB you always have to use map reduce to query results. In MongoDB you can their query methods for retrieving data, but they also let you do map-reduce. I wonder, when do I actually need map-reduce? Are those query methods different from map-reduce or are they just wrappers for map-reduce functions? ...

Scala analogues of QtConcurrent

Hello! What are the analogues of QtConcurrent for Scala (or Java)? Ie simplified implementation of MapReduce, the parallel map and foldl. Thank you ...

Hadoop Data Persistance in which format?

Hi, I have some experience with Lucene, I'm trying to understand how the data is actually stored in slave server in Hadoop framework? Do we create an index in Slave Server with set of attributes to describe Document we are storing? how does it works in reality ? Thanks R ...

Is there a way to pass through "find" before map_reduce for MongoDB?

The following line works: Analytic.collection.map_reduce(map, reduce).find but is there a way to do Analytic.collection.find('page_type' => 'products').map_reduce(map, reduce).find and even filter a date range such as date >= "2010-08-01" and date <= "2010-08-31"? ...

Does MongoDB's Map/Reduce sort work?

If the following is used Analytic.collection.map_reduce(map, reduce, :query => {:page => subclass_name}, :sort => [[:pageviews, Mongo::DESCENDING]]).find.to_a it won't sort by pageviews. Alternatively, if it is array of hash: Analytic.collection.map_reduce(map, reduce, :query => {:page => subclass_name}, :sort => [{:page...

MapReduce on more than one datastore kind in Google App Engine

I just watched Batch data processing with App Engine session of Google I/O 2010, read some parts of MapReduce article from Google Research and now I am thinking to use MapReduce on Google App Engine to implement a recommender system in Python. I prefer using appengine-mapreduce instead of Task Queue API because the former offers easy it...

Is MongoDB's query or Mongoid's API good for filtering 2000 items from a total of 80,000?

The idea is to do analytics of 30 or 2000 products out of a collection of 80,000 products. Say, if there are 80,000 products, and to get the top products with highest number of pageviews in a category, which can include only 30 or up to 2000 products, so we can either filter out all those products first, and then use map/reduce to find ...

Using MongoDB, any easy way to re-use Map/Reduce results?

For example, when doing Analytics, there can be a map/reduce run that takes 10 seconds. After it is run, if other webpages can make use of that result, then it will be saving 10 seconds per page. It will be good to have the map/reduce result cached somehow. It is possible to record a sucessful map/reduce run as map_reduce_result_[time...

Using Mapreduce to map multiple unique values not always present on the same lines

I have run into a complex problem with Mapreduce. I am trying to match up 2 unique values that are not always present together in the same line. Once I map those out, I need to count the total number of unique events for that mapping. The log files I am crunching are 100GB+ uncompressed and has data broken into 2 parts that I need to ...

After MongoDB's Map/Reduce results is saved as a permanent collection, how do you sort it internally, and get it back using Mongoid?

Is it true that Map/Reduce results can be stored permanently, but not sorted? For example, coll = Analytic.collection.map_reduce(map, reduce, :out => 'analyticsCachedResult') the above is stored permanently but is not sorted. To sort it on the fly, it can be coll.find({}, :sort => ['value.pa...

Code for sorting using map reduce in c or java

I need a code or help to write a code in c or java to demonstrate the power of map reduce. ...

Efficient set operations in mapreduce

I have inherited a mapreduce codebase which mainly calculates the number of unique user IDs seen over time for different ads. To me it doesn't look like it is being done very efficiently, and I would like to know if anyone has any tips or suggestions on how to do this kind of calculation as efficiently as possible in mapreduce. We use H...

How to find an distinct URL only in set A not in set B.

There are two sets of URL, both contains millions of URLs. Now, How can I get an URL from A that is not in B. What's The best methods? Note: you can use any technique, use any tools like database, mapreduce, hashcode, etc, . We should consider the memory efficient, time efficient. You have to consider that every set (A and B) have millio...

Hadoop and MS SQL Server Best Practices

Hi, I've been following Hadoop for a while, it seems like a great technology. The Map/Reduce, Clustering it's just good stuff. But I haven't found any article regarding the use of Hadoop with SQL Server. Let's say I have a huge claims table (600 million rows) and I want to take advantage of Hadoop. I was thinking but correct me if I'm ...

flatten a dictionary of dictionaries of lists in Python

Hi all... I'm trying to wrap my brain around this but it's not flexible enough. In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys. Thus I want to transf...

MongoDB map/reduce over multiple collections?

First, the background. I used to have a collection logs and used map/reduce to generate various reports. Most of these reports were based on data from within a single day, so I always had a condition d: SOME_DATE. When the logs collection grew extremely big, inserting became extremely slow (slower than the app we were monitoring was gene...