views:

124

answers:

0

I just read a few articles on data modeling and relationships in document oriented databases (... anti pattern, CouchDB article, MongoDB article ...). There are certain "issues" about which I would like to talk and hopefully find a solution/pattern to stop them deluding me.

There is a common saying that you should stop thinking like you are used to in RDBMS world when modeling databases in CouchDB or MongoDB. Document oriented databases can provide benefits like embedded documents, schema-less design, performance advantage and other stuff which creates new/better solutions and patterns to solve various well known scenarios. However I would say that you can't just get rid of the relationships which are well understood in RDBMS world although they need ORM abstraction to solve impedance mismatch. Both articles from CouchDB and MongoDB are using RDBMS way to solve one-to-many (or many-to-many) relationships when embedding documents is not an option.

Embedding documents inside another document is a pretty cool feature, but only if these JSON objects are not big, complex or their count is small. Embedded documents which contains another embedded documents and so on or blog post document with hundreds of embedded comments would probably have performance consequences. Also querying these beasts with MapReduce is not a pretty job every time.

Situation with impedance mismatch is much better with document oriented databases and MapReduce functions can serve you just what you need. This however comes at a cost that you need to know what you are looking for in advance.

The other common saying is that you model your documents in document oriented databases more "tightly" to your application scenarios. Since this is software engineering and not a construction building, it's nothing unusual when scenarios or specifications are changed. Imagine that you have initially created some JSON document which embeds few other documents in array. You have everything in one place, everything works fine without "explicit" one-to-many relationship which you will probably use in RDBMS world. Now here comes the breaking change (or request for a new feature) in specification and you need (or you are forced) to rebuild all the stuff and use one-to-many like join-wannabe solution.

I'm not saying that document database models have to be generic and ready for any change or that I like RDBMS more (because I don't). I just want to point out and clarify my "issues" with document databases and open them for a broader discussion.

The actual question may be how to design your documents with proper relationships in document oriented databases. I know that it is application specific, but this could at least accumulate some hints/patterns which may lead to better data modeling in these kind of databases.