views:

236

answers:

2

Given that document databases, such as RavenDB, are non-relational, how do you avoid duplicating data that multiple documents have in common? How do you maintain that data if it's okay to duplicate it?

+2  A: 

There's no one "right" answer to your question IMHO. It truly depends on how mutable the data you're duplicating is.

Take a look at the RavenDB documentation for lots of answers about document DB design vs. relational, but specifically check out the "Associations Management" section of the Document Structure Design Considerations document. In short, document DBs use the concepts of reference by IDs when they don't want to embed shared data in a document. These IDs are not like FKs, they are entirely up to the application to ensure the integrity of and resolve.

Drew Marsh
+2  A: 

With a document database you have to duplicate your data to some degree. What that degree is will depend on your system and use cases.

For example if we have a simple blog and user aggregates we could set them up as:

  public class User 
  {
    public string Id { get; set; }
    public string Name  { get; set; }
    public string Username  { get; set; }
    public string Password  { get; set; }
  }

  public class Blog
  {
     public string Id  { get; set; }
     public string Title  { get; set; }

     public class BlogUser
     {
       public string Id  { get; set; }
       public string Name  { get; set; }
     }
  }

In this example I have nested a BlogUser class inside the Blog class with the Id and Name properties of the User Aggregate associated with the Blog. I have included these fields as they are the only fields the Blog class is interested in, it doesn't need to know the users username or password when the blog is being displayed.

These nested classes are going to dependant on your systems use cases, so you have to design them carefully, but the general idea is to try and design Aggregates which can be loaded from the database with a single read and they will contain all the data required to display or manipulate them.

This then leads to the question of what happens when the User.Name gets updated.

With most document databases you would have to load all the instances of Blog which belong to the updated User and update the Blog.BlogUser.Name field and save them all back to the database.

Raven is slightly different as it support set functions for updates, so you are able to run a single update against RavenDB which will up date the BlogUser.Name property of the users blogs without you have to load them and update them all individually.

The code for doing the update within RavenDB (the manual way) for all the blog's would be:

  public void UpdateBlogUser(User user)
  {
    var blogs = session.Query<Blog>("blogsByUserId")
                  .Where(b.BlogUser.Id == user.Id)
                  .ToList();

    foreach(var blog in blogs)
       blog.BlogUser.Name == user.Name;

    session.SaveChanges()
  }

I've added in the SaveChanges just as an example. The RavenDB Client uses the Unit of Work pattern and so this should really happen somewhere outside of this method.

theouteredge
It does indeed support set functions, for updating anyway
Rob Ashton
@theouteredge So is it safe to say that you should replicate any data you would otherwise need to retrieve from another document, and use set functions to update/maintain the duplicated data? What would that update look like (it would make your answer that much better!). Thanks!
John Nelson
@Jon I've added an example of doing this the long way, I haven't looked into Ravens set operations yet. I'll look into it and add an example.
theouteredge
@Rob thanks, I've updated my answer
theouteredge