views:

1073

answers:

6

At work, we recently started a project using CouchDB (a document-oriented database). I've been having a hard time un-learning all of my relational db knowledge.

I was wondering how some of you overcame this obstacle? How did you stop thinking relationally and start think documentally (I apologise for making up that word).

Any suggestions? Helpful hints?

Edit: If it makes any difference, we're using Ruby & CouchPotato to connect to the database.

Edit 2: SO was hassling me to accept an answer. I chose the one that helped me learn the most, I think. However, there's no real "correct" answer, I suppose.

+2  A: 

may be you should read this http://books.couchdb.org/relax/getting-started

i myself just heard it and it is interesting but have no idea how to implemented that in the real world application ;)

nightingale2k1
after reading that article i found that every data is a document. has no relationship like master detail ... each data is independent document. for example a Blog post has tags, contents, author and comments.in relationship database we define some tables like tags, posts, comments and authors and each table related with one another. posts has many tags. authors has many posts etc. in couchdb .. you have no posts,tags etc. all in one. cmmiiw
nightingale2k1
A: 

One thing you can try is getting a copy of firefox and firebug, and playing with the map and reduce functions in javascript. they're actually quite cool and fun, and appear to be the basis of how to get things done in CouchDB

here's Joel's little article on the subject : http://www.joelonsoftware.com/items/2006/08/01.html

Breton
i think joel talking about closure (in groovy term) or blocks (in ruby). has nothing to do with couchDB
nightingale2k1
Then I think you have a big fat case of TLDR syndrome. The article is about Map/Reduce
Breton
Which I think, you'll find it is *very* relevant.
Breton
+7  A: 

It's all about the data. If you have data which makes most sense relationally, a document store may not be useful. A typical document based system is a search server, you have a huge data set and want to find a specific item/document, the document is static, or versioned.

In an archive type situation, the documents might literally be documents, that don't change and have very flexible structures. It doesn't make sense to store their meta data in a relational databases, since they are all very different so very few documents may share those tags. Document based systems don't store null values.

Non-relational/document-like data makes sense when denormalized. It doesn't change much or you don't care as much about consistency.

If your use case fits a relational model well then it's probably not worth squeezing it into a document model.

Here's a good article about non relational databases.

Another way of thinking about it is, a document is a row. Everything about a document is in that row and it is specific to that document. Rows are easy to split on, so scaling is easier.

Tim
+5  A: 

In CouchDB, like Lotus Notes, you really shouldn't think about a Document as being analogous to a row.

Instead, a Document is a relation (table).

Each document has a number of rows--the field values:

ValueID(PK)  Document ID(FK)   Field Name        Field Value
========================================================
92834756293  MyDocument        First Name        Richard
92834756294  MyDocument        States Lived In   TX
92834756295  MyDocument        States Lived In   KY

Each View is a cross-tab query that selects across a massive UNION ALL's of every Document.

So, it's still relational, but not in the most intuitive sense, and not in the sense that matters most: good data management practices.

richardtallent
+10  A: 

I think, after perusing about on a couple of pages on this subject, it all depends upon the types of data you are dealing with.

RDBMSes represent a top-down approach, where you, the database designer, assert the structure of all data that will exist in the database. You define that a Person has a First,Last,Middle Name and a Home Address, etc. You can enforce this using a RDBMS. If you don't have a column for a Person's HomePlanet, tough luck wanna-be-Person that has a different HomePlanet than Earth; you'll have to add a column in at a later date or the data can't be stored in the RDBMS. Most programmers make assumptions like this in their apps anyway, so this isn't a dumb thing to assume and enforce. Defining things can be good. But if you need to log additional attributes in the future, you'll have to add them in. The relation model assumes that your data attributes won't change much.

"Cloud" type databases using something like MapReduce, in your case CouchDB, do not make the above assumption, and instead look at data from the bottom-up. Data is input in documents, which could have any number of varying attributes. It assumes that your data, by its very definition, is diverse in the types of attributes it could have. It says, "I just know that I have this document in database Person that has a HomePlanet attribute of "Eternium" and a FirstName of "Lord Nibbler" but no LastName." This model fits webpages: all webpages are a document, but the actual contents/tags/keys of the document vary soo widely that you can't fit them into the rigid structure that the DBMS pontificates from upon high. This is why Google thinks the MapReduce model roxors soxors, because Google's data set is so diverse it needs to build in for ambiguity from the get-go, and due to the massive data sets be able to utilize parallel processing (which MapReduce makes trivial). The document-database model assumes that your data's attributes may/will change a lot or be very diverse with "gaps" and lots of sparsely populated columns that one might find if the data was stored in a relational database. While you could use an RDBMS to store data like this, it would get ugly really fast.

To answer your question then: you can't think "relationally" at all when looking at a database that uses the MapReduce paradigm. Because, it doesn't actually have an enforced relation. It's a conceptual hump you'll just have to get over.


A good article I ran into that compares and contrasts the two databases pretty well is MapReduce: A Major Step Back, which argues that MapReduce paradigm databases are a technological step backwards, and are inferior to RDBMSes. I have to disagree with the thesis of the author and would submit that the database designer would simply have to select the right one for his/her situation.

sheepsimulator
A lot of the criticisms that article directs toward MapReduce-based databases seem to be addressed in CouchDB. CouchDB uses B-tree indexes, supports views (in fact, CouchDB appears to have more of an emphasis on views than MySQL does), allows updates, makes replication easy, etc.
Chuck
@Chuck: It has more emphasis on views because there are no queries, only views.
Matt Grande
+3  A: 

Document-oriented databases do not reject the concept of relations, they just sometimes let applications dereference the links (CouchDB) or even have direct support for relations between documents (MongoDB). What's more important is that DODBs are schema-less. In table-based storages this property can be achieved with significant overhead (see answer by richardtallent), but here it's done more efficiently. What we really should learn when switching from a RDBMS to a DODB is to forget about tables and to start thinking about data. That's what sheepsimulator calls the "bottom-up" approach. It's an ever-evolving schema, not a predefined Procrustean bed. Of course this does not mean that schemata should be completely abandoned in any form. Your application must interpret the data, somehow constrain its form -- this can be done by organizing documents into collections, by making models with validation methods -- but this is now the application's job.

Andy Mikhaylenko