I am using Ruby code to calculate sum from the array returned by Mongoid.
But maybe using Map/Reduce can be faster, except I don't see any docs for Map Reduce on mongoid.org and Google for
map reduce site:mongoid.org
doesn't give any result either. (or using MapReduce or Map/Reduce)
There are docs on MongoDB's site
map reduce site...
I am working on a search engine design, which is to be run on cloud.
We have just started, and have not much idea about Hdoop.
Can anyone tell if HBase , MapReduce and HDFS can work on a single machine having Hdoop installed and running on it ?
...
I am using Mongoid, which is on top of the Ruby MongDB driver. Even though my Map's emit is giving out a parseInt(num), and the Reduce's return is giving back also a parseInt(num), the final results still are floats.
Is that particular to MongoDB? Any way to make it integer instead?
...
If my structure looks like this:
[{Name: 'A', Depends: []},
{Name: 'B', Depends: ['A']},
{Name: 'C', Depends: ['A']},
{Name: 'D', Depends: ['C']},
{Name: 'E', Depends: ['D','B']}]
How would I write the map and reduce functions such that my output is:
[{Key: 'A', Value: []},
{Key: 'B', Value: ['A']},
{Key: 'C', Value: ['A']},
{Key: 'D...
In CouchDB you always have to use map reduce to query results.
In MongoDB you can their query methods for retrieving data, but they also let you do map-reduce.
I wonder, when do I actually need map-reduce?
Are those query methods different from map-reduce or are they just wrappers for map-reduce functions?
...
Hello!
What are the analogues of QtConcurrent for Scala (or Java)? Ie simplified implementation of MapReduce, the parallel map and foldl.
Thank you
...
Hi,
I have some experience with Lucene, I'm trying to understand how the data is actually stored in slave server in Hadoop framework?
Do we create an index in Slave Server with set of attributes to describe Document we are storing? how does it works in reality ?
Thanks
R
...
The following line works:
Analytic.collection.map_reduce(map, reduce).find
but is there a way to do
Analytic.collection.find('page_type' => 'products').map_reduce(map, reduce).find
and even filter a date range such as date >= "2010-08-01" and date <= "2010-08-31"?
...
If the following is used
Analytic.collection.map_reduce(map, reduce,
:query => {:page => subclass_name},
:sort => [[:pageviews, Mongo::DESCENDING]]).find.to_a
it won't sort by pageviews. Alternatively, if it is array of hash:
Analytic.collection.map_reduce(map, reduce,
:query => {:page => subclass_name},
:sort => [{:page...
I just watched Batch data processing with App Engine session of Google I/O 2010, read some parts of MapReduce article from Google Research and now I am thinking to use MapReduce on Google App Engine to implement a recommender system in Python.
I prefer using appengine-mapreduce instead of Task Queue API because the former offers easy it...
The idea is to do analytics of 30 or 2000 products out of a collection of 80,000 products.
Say, if there are 80,000 products, and to get the top products with highest number of pageviews in a category, which can include only 30 or up to 2000 products, so we can either filter out all those products first, and then use map/reduce to find ...
For example, when doing Analytics, there can be a map/reduce run that takes 10 seconds. After it is run, if other webpages can make use of that result, then it will be saving 10 seconds per page.
It will be good to have the map/reduce result cached somehow.
It is possible to record a sucessful map/reduce run as map_reduce_result_[time...
I have run into a complex problem with Mapreduce. I am trying to match up 2 unique values that are not always present together in the same line. Once I map those out, I need to count the total number of unique events for that mapping.
The log files I am crunching are 100GB+ uncompressed and has data broken into 2 parts that I need to ...
Is it true that Map/Reduce results can be stored permanently, but not sorted? For example,
coll = Analytic.collection.map_reduce(map, reduce,
:out => 'analyticsCachedResult')
the above is stored permanently but is not sorted.
To sort it on the fly, it can be
coll.find({}, :sort => ['value.pa...
I need a code or help to write a code in c or java to demonstrate the power of map reduce.
...
I have inherited a mapreduce codebase which mainly calculates the number of unique user IDs seen over time for different ads. To me it doesn't look like it is being done very efficiently, and I would like to know if anyone has any tips or suggestions on how to do this kind of calculation as efficiently as possible in mapreduce.
We use H...
There are two sets of URL, both contains millions of URLs. Now, How can I get an URL from A that is not in B. What's The best methods?
Note: you can use any technique, use any tools like database, mapreduce, hashcode, etc, . We should consider the memory efficient, time efficient. You have to consider that every set (A and B) have millio...
Hi,
I've been following Hadoop for a while, it seems like a great technology. The Map/Reduce, Clustering it's just good stuff. But I haven't found any article regarding the use of Hadoop with SQL Server.
Let's say I have a huge claims table (600 million rows) and I want to take advantage of Hadoop. I was thinking but correct me if I'm ...
Hi all... I'm trying to wrap my brain around this but it's not flexible enough.
In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys.
Thus I want to transf...
First, the background. I used to have a collection logs and used map/reduce to generate various reports. Most of these reports were based on data from within a single day, so I always had a condition d: SOME_DATE. When the logs collection grew extremely big, inserting became extremely slow (slower than the app we were monitoring was gene...