tags:

views:

781

answers:

2

I would like to use CouchDB to store some data for me and then use RESTful api calls to get the data that I need. My database is called "test" and my documents all have a similar structure and look something like this (where hello_world is the document ID):

"hello_world" : {"id":123, "tags":["hello", "world"], "text":"Hello World"}
"foo_bar" :{"id":124, "tags":["foo", "bar"], "text":"Foo Bar"}

What I'd like to be able to do is have my users send a query such as: "Give me all the documents that contain the words 'hello world', for example. I've been playing around with views but it looks like they will only allow me to move one or more of those values into the "key" portion of the map function. That gives me the ability to do something like this:

http://localhost:5984/test/_design/search/_view/search_view?key="hello"

But this doesn't allow me to let my users specify their query string. For example, what if they searched for "hello world". I'd have to do two queries: one for "hello" and one for "world" then I'd have to write a bunch of javascript to combine the results, remove duplicates, etc (YUCK!). What I really want is to be able to do something like this:

http://localhost:5984/test/_design/search/_view/search_view?term="hello world"

Then use the parameter "hello world" in the views map/reduce functions to find all the documents that contain both "hello" and "world" in the tags array. Is this sort of thing even possible with CouchDB? Is there another way to accomplish this inside a view that I'm not thinking of?

+6  A: 

CouchDB Views do not support facetted search or fulltext search or result intersection. The couchdb-lucene plugin lets you do all these things.

http://github.com/rnewson/couchdb-lucene/tree/master

Jan Lehnardt
Care to elaborate or provide examples?
Mike Farmer
He's one of the developers of the project- "You can't do it, but this project will let you." That's a pretty good answer.
dnolen
+1  A: 

Technically this is possible of you emit for each document each set of the powerset of the tags of the document as the key. The key set element must be ordered and your query whould have to query the tags ordered, too.

function map(doc) {
  function powerset(array) { ... }

  powerset_of_tags = powerset(doc.tags)
  for(i in powerset_of_tags) {
    emit(powerset_of_tags[i], doc);
  }
}

for the doc {"hello_world" : {"id":123, "tags":["hello", "world"], "text":"Hello World"} this would emit:

{ key: [], doc: ... }
{ key: ['hello'], doc: ... }
{ key: ['world'], doc: ... }
{ key: ['hello', 'world'], doc: ... }

Although is this possible I would consider this a rather arkward solution. I don't want to imagine the disk usage of the view for a larger number of tags. I expect the number of emitted keys to grow like 2^n.

ordnungswidrig
this is not recommended. Performance will suffer greatly and as you mentioned the storage for the indexes will grow out of control.couchdb-lucene mentioned above is the correct way to do what he's wanting.
Jeremy Wall