tags:

views:

39

answers:

1

Consider the following documents in a CouchDB:

{
  "name":"Foo1",
  "tags":["tag1", "tag2", "tag3"],
  "otherTags":["otherTag1", "otherTag2"]
}

{
  "name":"Foo2",
  "tags":["tag2", "tag3", "tag4"],
  "otherTags":["otherTag2", "otherTag3"]
}

{
  "name":"Foo3",
  "tags":["tag3", "tag4", "tag5"],
  "otherTags":["otherTag3", "otherTag4"]
}

I'd like to query all documents that contain ALL (not any!) tags given as the key.

For example, if I request using '["tag2", "tag3"]' I'd like to retrieve Foo1 and Foo2.

I'm currently doing this by querying by tag, first for "tag2", then for "tag3", creating the union manually afterwards.

This seems to be awfully inefficient and I assume that there must be a better way.

My second question - but they are quite related, I think - would be:

How would I query for all documents that contain "tag2" AND "tag3" AND "otherTag3"?

I hope a question like this hasn't been asked/answered before. I searched for it and didn't find one.

+1  A: 

Do you have a maximum number of?

  • Tags per document, and
  • Tags allowed in the query

If so, you have an upper-bound on the maximum number of tags to be indexed. For example, with a maximum of 5 tags per document, and 5 tags allowed in the AND query, you could simply output every 1, 2, 3, 4, and 5-tag combination into your index, for a maximum of 1 (five-tag combos + 5 (four-tag combos) + 10 (three-tag combos) + 10 (two-tag combos) + 5 (one-tag combos) = 31 rows in the view for that document.

That may be acceptable to you, considering that it's quite a powerful query. The disk usage may be acceptable (especially if you simply emit(tags, {_id: doc._id}) to minimize data in the view, and you can use ?include_docs=true to get the full document later. The final thing to remember is to always emit the key array sorted, and always query it the same way, because you are emitting only tag combinations, not permutations.

That can get you so far, however it does not scale up indefinitely. For full-blown arbitrary AND queries, you will indeed be required to split into multiple queries, or else look into CouchDB-Lucene.

jhs
Well, I don't have a real constraint on the number of allowed tags in either the query or the document... I was thinking about an emit for every tag contained in the document but couldn't find a way to work with the created data, either. Thanks for your input, I'll have a look at CouchDB-Lucene now.
Huxi