views:

266

answers:

4

I'm looking a way to automate schema migration for such databases like MongoDB or CouchDB.

Preferably, this instument should be written in python, but any other language is ok.

+3  A: 

One of the supposed benefits of these databases is that they are schemaless, and therefore don't need schema migration tools. Instead, you write your data handling code to deal with the variety of data stored in the db.

Ned Batchelder
It is hard to write code to handle all versions of the documents. Code evolves and database should evolve too.Such databases are not schemaless, they are schema free. And this mean that you could have some document structures but there are no strong restrictions.
Alexander Artemenko
I think that for NoSQL databases we've to have "data migration" tools, rather then "schema migration" tools. If there isn't any, then I'll write one myself.
Alexander Artemenko
I'm not sure what the distinction is between "schemaless" and "schema free". In any case, one advantage of these databases is that you don't have to update all of the data when the schema changes. You could, for example, update each record/document as it is read and discovered to be in an old format. If you don't find any tools that do what you want, you are either blazing a new trail, or not understanding the NoSQL culture.
Ned Batchelder
Ok. To update data to a new version I need a tool anyway. In my opinion, it is more convinient then to have code which will work with all versions of the the documents.Are you really don't understand difference between schemaless and schema-free? :-)
Alexander Artemenko
+1  A: 

If your data are sufficiently big, you will probably find that you cannot EVER migrate the data, or that it is not beneficial to do so. This means that when you do a schema change, the code needs to continue to be backwards compatible with the old formats forever.

Of course if your data "age" and eventually expire anyway, this can do schema migration for you - simply change the format for newly added data, then wait for all data in the old format to expire - you can then retire the backward-compatibility code.

MarkR
Hm, this make a sense. But question is about ready tools, which will help me too keep my document versions up to date.
Alexander Artemenko
+1  A: 

Since a nosql database can contain huge amounts of data you can not migrate it in the regular rdbms sence. Actually you can't do it for rdbms as well as soon as your data passes some size threshold. It is impractical to bring your site down for a day to add a field to an existing table, and so with rdbms you end up doing ugly patches like adding new tables just for the field and doing joins to get to the data. In nosql world you can do several things.

  • As others suggested you can write your code so that it will handle different 'versions' of the possible schema. this is usually simpler then it looks. Many kinds of schema changes are trivial to code around. for example if you want to add a new field to the schema, you just add it to all new records and it will be empty on the all old records (you will not get "field doesn't exist" errors or anything ;). if you need a 'default' value for the field in the old records it is too trivially done in code.
  • Another option and actually the only sane option going forward with non-trivial schema changes like field renames and structural changes is to store schema_version in EACH record, and to have code to migrate data from any version to the next on READ. i.e. if your current schema version is 10 and you read a record from the database with the version of 7, then your db layer should call migrate_8, migrate_9, and migrate_10. This way the data that is accessed will be gradually migrated to the new version. and if it is not accessed, then who cares which version is it;)
Vitaly Kushner
A: 

When a project has a need for a schema migration in regards to a NoSQL database makes me think that you are still thinking in a Relational database manner, but using a NoSQL database.

If anybody is going to start working with NoSQL databases, you need to realize that most of the 'rules' for a RDBMS (i.e. MySQL) need to go out the window too. Things like strict schemas, normalization, using many relationships between objects. NoSQL exists to solve problems that don't need all the extra 'features' provided by a RDBMS.

I would urge you to write your code in a manner that doesn't expect or need a hard schema for your NoSQL database - you should support an old schema and convert a document record on the fly when you access if if you really want more schema fields on that record.

Please keep in mind that NoSQL storage works best when you think and design differently compared to when using a RDBMS

Redbeard 0x0A
This is not a solution. Thank you for you "interesting" IMHO.
Alexander Artemenko
No, this isn't a 'solution' - nor is the accepted answer as it is basically a 'you cannot do it' if you look at the answers in the same way. All I am trying to do is draw attention to the fact that one should really question themselves if they *really* need a hard schema on a NoSQL database. Schemas can cause problems at scale, which is one reason that NoSQL is a good scaling solution, they don't have hard schemas.
Redbeard 0x0A