views:

1572

answers:

2

What are the best practices for NoSQL Databases, OODBs or whatever other acronyms may exist for them?

For example, I've often seen a field "type" being used for deciding how the DB document (in couchDB/mongoDB terms) should be interpreted by the client, the application.

Where applicable, use PHP as a reference language. Read: I'm also interested in how such data can be best handled on the client side, not only strictly the DB structure. This means practically that I'm also looking for patterns like "ORM"s for SQL DBs (active record, data mapper, etc).

Don't hesitate making statements about how such a DB and the new features of PHP 5.3 could best work together.

+2  A: 

"NoSQL" should be more about building the datastore to follow your application requirements, not about building the app to follow a certain structure -- that's more like a traditional SQL approach.

Don't abandon a relational database "just because"; only do it if your app really needs to.

Joel L
I never said I wanted to build something to follow something else. I was looking for best patterns you have successfully used to make things work best together. Ideally, with PHP 5.3's features. Also, I am not looking for advices as whether to use relational or document-oriented DBs.
Flavius
+13  A: 

I think that currently, the whole idea of NoSQL data stores and the concept of document databases is so new and different from the established ideas which drive relational storage that there are currently very few (if any) best practices.

We know at this point that the rules for storing your data within say CouchDB (or any other document database) are rather different to those for a relational one. For example, it is pretty much a fact that normalisation and aiming for 3NF is not something one should strive for. One of the common examples would be that of a simple blog.

In a relational store, you'd have a table each for "Posts", "Comments" and "Authors". Each Author would have many Posts, and each Post would have many Comments. This is a model which works well enough, and maps fine over any relational DB. However, storing the same data within a docDB would most likely be rather different. You'd probably have something like a collection of Post documents, each of which would have its own Author and collection of Comments embedded right in. Of course that's probably not the only way you could do it, and it is somewhat a compromise (now querying for a single post is fast - you only do one operation and get everything back), but you have no way of maintaining the relationship between authors and posts (since it all becomes part of the post document).

I too have seen examples making use of a "type" attribute (in a CouchDB example). Sure, that sounds like a viable approach. Is it the best one? I haven't got a clue. Certainly in MongoDB you'd use seperate collections within a database, making the type attribute total nonsense. In CouchDB though... perhaps that is best. The other alternatives? Separate databases for each type of document? This seems a bit loopy, so I'd lean towards the "type" solution myself. But that's just me. Perhaps there's something better.

I realise I've rambled on quite a bit here and said very little, most likely nothing you didn't already know. My point is this though - I think its up to us to experiment with the tools we've got and the data we're working with and over time the good ideas will be spread and become the best-practices. I just think you're asking a little too early in the game.

Splash
You're right: it's too early, just as I thought too. +1, but I'll be waiting for a while for other opinions too.
Flavius
As I can see it soo far, it strongly depends on the operations you intend to do on the data and how often the application would do those operations under normal circumstances. For instance, embeding comments into a "post" would not be best if the app would put in place a functionality to "stalk" a specific person (like "tweets").
Flavius
For sure. I think any best-practices which do emerge will be predicated on the way the data is to be made use of. Different decisions and compromises should be made in order to best accommodate those requirements.
Splash