views:

90

answers:

1

As I understand it, you can enter any non-structured information into a document-oriented database. Let's imagine a document like this:

{
  name: 'John Blank',
  yearOfBirth: 1960
}

Later, in a new version, this structure is refactored to

{
  firstname: 'John',
  lastname: 'Blank',
  yearOfBirth: 1960
}

How do you do this with Document-Oriented databases? Do you have to prepare merge-scripts, that alter all your entries in the database? Or are there better ways you can handle changes in the structure?

+3  A: 

Refactoring here implies that there's a deterministic mapping from the old schema to the new schema. So the most effective option is to do the same thing you'd do with a SQL database and actually update the documents.

Document-Oriented Databases do give you one other option, although it depends on which DODB and how you're using it on the front-end. That option is to simply leave the data alone, and support the "old" definition in your application as a sort of backward-compatibility option. In other words, you're doing these translations on-the-fly, as opposed to a permanent one-time update.

This isn't really an option with a SQL database because you'd have to keep the obsolete column around and possibly indexed. With a DODB, you're not really wasting any data or index space. You'd have to weigh the advantages against the disadvantages.

The primary disadvantage is, obviously, the inconsistency, which could grow over time and lead to bugs. Another disadvantage is possibly the computational expense of doing this on-the-fly, or the inability to effectively use the new structure (for example, you might want to index on just the lastname). So, most of the time, I think I would just choose to run a mass update.

There is one clear advantage to preserving the old documents, however; if you're not certain that your refactoring is perfect - for example, if the data in your name column didn't follow a consistent convention, maybe in some cases it's lastname, firstname and in other cases it's firstname lastname and in other cases it's company name - then doing your conversions on-the-fly without making a permanent update allows you to refine the mapping over time, so you can use the firstname and lastname fields when available but fall back to the name guessing-game for legacy data.

As stated, I'd probably reserve the second option for exceptional cases, where I'm not confident that I'll be able to get the "refactoring" correct for every record/document. Nevertheless, it's an option that is available to you that you don't really have with other types of databases.

Aside from those two, I don't see any other clear alternatives. It's kind of a binary decision; either you permanently update the existing data or you don't.

Aaronaught