views:

753

answers:

3

I have a question that I've been trying to answer for some time now but can't figure out:

How do you design, or divide up, CouchDB documents?

Take a Blog Post for example.

The semi "relational" way to do it would be to create a few objects:

  • Post
  • User
  • Comment
  • Tag
  • Snippet

This makes a great deal of sense. But I am trying to use couchdb (for all the reasons that it's great) to model the same thing and it's been extremely difficult.

Most of the blog posts out there give you an easy example of how to do this. They basically divide it up the same way, but say you can add 'arbitrary' properties to each document, which is definitely nice. So you'd have something like this in CouchDB:

  • Post (with tags and snippets "pseudo" models in the doc)
  • Comment
  • User

Some people would even say you could throw the Comment and User in there, so you'd have this:


post {
    id: 123412804910820
    title: "My Post"
    body: "Lots of Content"
    html: "<p>Lots of Content</p>"
    author: {
        name: "Lance"
        age: "23"
    }
    tags: ["sample", "post"]
    comments {
        comment {
            id: 93930414809
            body: "Interesting Post"
        } 
        comment {
            id: 19018301989
            body: "I agree"
        }
    }
}

That looks very nice and is easy to understand. I also understand how you could write views that extracted just the Comments from all your Post documents, to get them into Comment models, same with Users and Tags.

But then I think, "why not just put my whole site into a single document?":


site {
    domain: "www.blog.com"
    owner: "me"
    pages {
        page {
            title: "Blog"
            posts {
                post {
                    id: 123412804910820
                    title: "My Post"
                    body: "Lots of Content"
                    html: "<p>Lots of Content</p>"
                    author: {
                        name: "Lance"
                        age: "23"
                    }
                    tags: ["sample", "post"]
                    comments {
                        comment {
                            id: 93930414809
                            body: "Interesting Post"
                        } 
                        comment {
                            id: 19018301989
                            body: "I agree"
                        }
                    }
                }
                post {
                    id: 18091890192984
                    title: "Second Post"
                    ...
                }
            }
        }
    }
}

You could easily make views to find what you wanted with that.

Then the question I have is, how do you determine when to divide the document into smaller documents, or when to make "RELATIONS" between the documents?

I think it would be much more "Object Oriented", and easier to map to Value Objects, if it were divided like so:


posts {
    post {
        id: 123412804910820
        title: "My Post"
        body: "Lots of Content"
        html: "<p>Lots of Content</p>"
        author_id: "Lance1231"
        tags: ["sample", "post"]
    }
}
authors {
    author {
        id: "Lance1231"
        name: "Lance"
        age: "23"
    }
}
comments {
    comment {
        id: "comment1"
        body: "Interesting Post"
        post_id: 123412804910820
    } 
    comment {
        id: "comment2"
        body: "I agree"
        post_id: 123412804910820
    }
}

... but then it starts looking more like a Relational Database. And often times I inherit something that looks like the "whole-site-in-a-document", so it's more difficult to model it with relations.

I've read lots of things about how/when to use Relational Databases vs. Document Databases, so that's not the main issue here. I'm more just wondering, what's a good rule/principle to apply when modeling data in CouchDB.

Another example is with XML files/data. Some XML data has nesting 10+ levels deep, and I would like to visualize that using the same client (Ajax on Rails for instance, or Flex) that I would to render JSON from ActiveRecord, CouchRest, or any other Object Relational Mapper. Sometimes I get huge XML files that are the entire site structure, like the one below, and I'd need to map it to Value Objects to use in my Rails app so I don't have to write another way of serializing/deserializing data:


<pages>
    <page>
        <subPages>
            <subPage>
                <images>
                    <image>
                        <url/>
                    </image>
                </images>
            </subPage>
        </subPages>
    </page>
</pages>

So the general CouchDB questions are:

  1. What rules/principles do you use to divide up your documents (relationships, etc)?
  2. Is it okay to put the entire site into one document?
  3. If so, how do you handle serializing/deserializing documents with arbitrary depths levels (like the large json example above, or the xml example)?
  4. Or do you not turn them into VOs, do you just decide "these ones are too nested to Object-Relational Map, so I'll just access them using raw XML/JSON methods"?

Thanks a lot for your help, the issue of how to divide up your data with CouchDB has been difficult for me to say "this is how I should do it from now on". I hope to get there soon.

I have studied the following sites/projects.

  1. Hierarchical Data in CouchDB
  2. CouchDB Wiki
  3. Sofa - CouchDB App
  4. CouchDB The Definitive Guide
  5. PeepCode CouchDB Screencast
  6. CouchRest
  7. CouchDB README

...but they still haven't answered this question.

+4  A: 

The book says, if I recall correctly, to denormalize until "it hurts", while keeping in mind the frequency with which your documents might be updated.

  1. What rules/principles do you use to divide up your documents (relationships, etc)?

As a rule of thumb, I include all data that is needed to display a page regarding the item in question. In other words, everything you would print on a real-world piece of paper that you would hand to somebody. E.g. a stock quote document would include the name of the company, the exchange, the currency, in addition to the numbers; a contract document would include the names and addresses of the counterparties, all information on dates and signatories. But stock quotes from distinct dates would form separate documents, separate contracts would form separate documents.

  1. Is it okay to put the entire site into one document?

No, that would be silly, because:

  • you would have to read and write the whole site (the document) on each update, and that is very inefficient;
  • you would not benefit from any view caching.
Eero
Thanks for getting into it with me a bit. I get the idea of "include all data that is needed to display a page regarding the item in question", but that is still very difficult to implement. A "page" could be a page of Comments, a page of Users, a page of Posts, or a page of Comments and Posts, etc. How would you divide them up then, principally? You could also have your Contract displayed with Users. I get the 'form-like' documents, that makes sense to keep them separate.
viatropos
A: 

Hi, I have been thinking about this problem for a while and I think it is harder than it first seem.

In my application the data model is process, each process contain activities. Also process has properties and activity has properties. In my case there might be screens that will show only the name of the process, but process is nothing without its activities so I guess I'll put the activities as part of the process document, not each activity as separate document. The properties of both process and activity might be update by different users at the same time so I don't want to put them as part of the process document because it will create far to much conflicts so I guess each property will a separate document but now I need to ensure "referential integrity" of the properties like delete them when the activity is deleted.

I think draw general (or even specific rules) like the books on relation database normalization will be good idea. Off course we need to address issues of replication, shards and any other thing CouchDB support that I'm not aware of yet.

Thank you, Ido.

Ido Ran
+1  A: 

Riak has a first class concept in their system called "Links", that essentially allow relations like functionality between Key Value stores.

https://wiki.basho.com/display/RIAK/Links