views:

132

answers:

1

StackOverflow lets you search for posts by tags, and lets you filter by an intersection of tags, e.g. ruby x mysql x tags. But typically it's inefficient to retrieve such lists from MySQL using mulitple joins on the taggings. What's a more performant way to implement filter-by-multiple tag queries?

Is there a good NoSQL approach to this problem?

+2  A: 

In a NoSQL or document-oriented scenario, you'd have the actual tags as part of your document, likely stored as a list. Since you've tagged this question with "couchdb", I'll use that as an example.

A "post" document in CouchDB might look like:

{
   "_id": <generated>,
   "question": "Question?",
   "answers": [... list of answers ...],
   "tags": ["mysql", "tagging", "joins", "nosql", "couchdb"]
}

Then, to generate a view keyed by tags:

{
   "_id": "_design/tags",
   "language": "javascript",
   "views": {
      "all": {
         "map": "function(doc) {
            emit(doc.tags, null);
         }"
      }
   }
}

In CouchDB, you can issue an HTTP POST with multiple keys, if you wish. An example is in the documentation. Using that technique, you would be able to search by multiple tags.

Note: Setting the value to null, above, helps keep the views small. Use include_docs=true in your query if you want to see the actual documents as well.

Ryan Duffield