tags:

views:

695

answers:

4

Is there any way to have multiple tag search implemented in CouchDB? I have documents (posts) each with multiple tags. I need to find posts that have been tagged with an arbitrary set of tags. How do I do it? I could of course do it with multiple calls to a view which gives me the documents for a tag and then sort it out in my app but I wanted to know if there was a way to achieve the same in the CouchDB view land.

+1  A: 

In the more recent versions of CouchDB, you can POST to a view with a JSON document called keys, which allows for multi-key lookup. The structure would look something like this:

{"keys": ["first_tag", "second_tag", "third_tag"]}

This could be POSTed to a view that you have that is emitting tags for its respective keys.

This and other querying options are documented here.

Ryan Duffield
I am not sure if this would be the best way. Suppose I have a list of 15 distinct tags that might be applied in any different combination and order then I would have 15^14 key combinations. Generating and indexing all these queries would by itself be a daunting task.PS. Math is not my strongest area. Correct me if I am wrong.
Vagmi Mudumbai
A: 

One way of doing is as explained above by Ryan Duffield. Though it solves some of the queries but it will become unmanageable over the period of time. Otherway is to use Full Text Search which is not currently supported by CouchDB but there is an external plugin using Lucene. more here http://wiki.apache.org/couchdb/Full_text_search.

A: 

Actually tagging seems to be a very relational problem and does not play well with CouchDB's design. So I have decided to have one small database for tags on mysql and have the actual documents stored at CouchDB. This lets me get the best of both worlds. Although this technique has problems related to synchronization, searching on tags is an efficient operation on sql and the content is not too much to worry about replication or sharding. Thanks for all your answers.

Vagmi Mudumbai
I would disagree with this assertion; tags work quite well when done correctly in CouchDB. I would recommend taking a look at something like Sofa for inspiration: http://github.com/jchris/sofa
Ryan Duffield
A: 

So, as far as I understood the answer is NO. CouchDB can't query for documents having presence of multiple tags (workaround with lucene or mysql doesn't count, this way we lost some features of CouchDB). Sad news :(.

(having presence of multiple tags - having both A and B, not A or B)

UPD! It's possible but with limitations to only 2-3 tags.

http://wiki.apache.org/couchdb/EntityRelationship

Querying by multiple keys

Some applications need to view the intersection of entities that have multiple keys. In the example above, this would be a query for the contacts who are in both the "Friends" and the "Colleagues" groups. The most straight-forward way to handle this situation is to query for one of the keys, and then to filter by the rest of the keys on the client-side. If the key frequencies vary greatly, it may also be worthwhile to make an initial call to determine the key with the lowest frequency, and to use that to fetch the initial document list from the database.

If this is not a good option, it is possible to index the combinations of the keys, though the growth of the index for a given document will be exponential with the number of its keys. Still, for small-ish key sets, this is an option, since the keys can be ordered, and keys which are prefixes of a larger key can be omitted. For instance, for the key set [1 2 3] the possible key combinations are [1] [2] [3] [1 2] [1 3] [2 3] [1 2 3] However, the index need only contain the keys [3] [1 3] [2 3] [1 2 3] since (for example) the documents matching the keys [1 2] could be obtained with a query for startkey=[1,2,null] and endkey=[1,2,{}] The number of index entries will be 2^(n-1) number of keys.

A final option is to use a separate index, such as couchdb-lucene to help with such queries.

Alexey Petrushin