views:

19

answers:

1

I am trying to figure out how you design data storage in a document storage system like CouchDB or MongoDB.

I don't use JOIN's anymore in my queries and just stick to searches for rows with certain indexes that meet my criterion. For example, I might look for recent comments (ORDER BY date) or all active users (WHERE status = 1). In other words, my search logic is all based on indexed int columns stored in RAM.

Moving over to NoSQL, There don't seem to be any indexes - so I'm trying to figure out these databases filter results without looking through each row manually. Update: somehow I missed this: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677

As for design, using the examples of storing a post with all comments as one document doesn't seem like it's logically sound. How would you find recent comments? Or how would you find the comments of a certain user?

Where can I go to learn how to convert schema's (and my way of thinking) so that I can build out apps using these document databases?

Update: I just didn't spend enough time going through the MongoDB site I guess. The documentation seems to cover most of the things needed like using indexes to filter results just like in sql. Also http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart and http://rickosborne.org/download/SQL-to-MongoDB.pdf were just what I needed.

+1  A: 

Actually it's not so much that there are no indexes as that there are no default indexes. With SQL if you don't create an index searching will be slow. With most NoSQL systems if you don't create an index searching won't happen at all. The approach used for indexing depends on the specific system you are using - sometimes you use a search engine to index documents, sometimes it's storing sets of ids for each possible value.

I disagree with storing comments in a post document myself - it is more flexible to have comments as seperate documents indexed by post id. However, if it happens that you don't care about those other queries (and your platform supports partial updates) a single document is the simplest solution - the right structure depends entirely on what you want to do with it.

You won't find anything generic to convert schemas since there is no single right answer. I think the best method is to think about the app rather than the data - if your sql blog app is reading several tables and creating a post object you know you should probably have a post document.

You should also try to think of solutions other than running queries to take advantage of features in your NoSQL platform - for example with redis I would implement recent comments as a list which is updated and trimmed each time a comment is added. Effectively it's a very specialized index - much more efficient than checking the date on every comment in the system and adding more keys/data types is not a problem when you don't need to worry about updating schemas.

Tom Clarkson