views:

601

answers:

3

Hi all,

I'm testing out CouchDB to see how it could handle logging some search results. What I'd like to do is produce a view where I can produce the top queries from the results. At the moment I have something like this:

Example document portion

{
  "query": "+dangerous +dogs",
  "hits": "123"
}

Map function (Not exactly what I need/want but it's good enough for testing)

function(doc) {
  if (doc.query) {
    var split = doc.query.split(" ");
    for (var i in split) {
      emit(split[i], 1);
    }
  }
}

Reduce Function

function (key, values, rereduce) {
  return sum(values);
}

Now this will get me results in a format where a query term is the key and the count for that term on the right, which is great. But I'd like it ordered by the value, not the key. From the sounds of it, this is not yet possible with CouchDB.

So does anyone have any ideas of how I can get a view where I have an ordered version of the query terms & their related counts? I'm very new to CouchDB and I just can't think of how I'd write the functions needed.

A: 

I'm unsure about the 1 you have as your returned result, but I'm positive this should do the trick:

emit([doc.hits, split[i]], 1);

The rules of sorting are defined in the docs.

Dominykas Blyžė
I shouldn't have thrown the hits parameter in there. That's a bit of a red herring. What I've got at the moment is something that creates something like this: Key: +dangerous Value: 25 . Where 25 means that 25 people entered a query containing the text "+dangerous".The code came from the CouchDB wiki here: http://wiki.apache.org/couchdb/View_Snippets#Retrieve_the_top_N_tags.
Lee Theobald
+2  A: 

This came up on the CouchDB-user mailing list, and Chris Anderson, one of the primary developers, wrote:

This is a common request, but not supported directly by CouchDB's views -- to do this you'll need to copy the group-reduce query to another database, and build a view to sort by value.

This is a tradeoff we make in favor of dynamic range queries and incremental indexes.

I needed to do this recently as well, and I ended up doing it in my app tier. This is easy to do in JavaScript:

db.view('mydesigndoc', 'myview', {'group':true}, function(err, data) {

    if (err) throw new Error(JSON.stringify(err));

    data.rows.sort(function(a, b) {
        return a.value - b.value;
    });

    data.rows.reverse(); // optional, depending on your needs

    // do something with the data…
});

This example runs in Node.js and uses node-couchdb, but it could easily be adapted to run in a browser or another JavaScript environment. And of course the concept is portable to any programming language/environment.

HTH!

Avi Flax
I expanded on the dedicated DB to sort however sorting in the application is likely to work in most situations. For the OP maybe not if these are search terms but still.. :)
jhs
Thanks for the great answer Avi. I'll give that a try.
Lee Theobald
+4  A: 

It is true that there is no dead-simple answer. There are several patterns however.

  1. http://wiki.apache.org/couchdb/View_Snippets#Retrieve_the_top_N_tags. I do not personally like this because they acknowledge that it is a brittle solution, and the code is not relaxing-looking.

  2. Avi's answer, which is to sort in-memory in your application.

  3. couchdb-lucene which it seems everybody finds themselves needing eventually!

  4. What I like is what Chris said in Avi's quote. Relax. In CouchDB, databases are lightweight and excel at giving you a unique perspective of your data. These days, the buzz is all about filtered replication which is all about slicing out subsets of your data to put in a separate DB.

    Anyway, the basics are simple. You take your .rows from the view output and you insert it into a separate DB which simply emits keyed on the count. An additional trick is to write a very simple _list function. Lists "render" the raw couch output into different formats. Your _list function should output

    { "docs":
        [ {..view row1...},
          {..view row2...},
          {..etc...}
        ]
    }
    

    What that will do is format the view output exactly the way the _bulk_docs API requires it. Now you can pipe curl directly into another curl:

    curl host:5984/db/_design/myapp/_list/bulkdocs_formatter/query_popularity \
     | curl -X POST host:5984/popularity_sorter/_design/myapp/_view/by_count
    
  5. In fact, if your list function can handle all the docs, you may just have it sort them itself and return them to the client sorted.

jhs
Great answer so thanks JHS. Plenty of options for me to look into.
Lee Theobald
You're welcome. If it wasn't clear I would say first choice = sort in the client; second choice = use an alternative view (both of which AVI identified more succinctly!)
jhs