views:

253

answers:

4

I'm curious about how others have approached the problem of maintaining and synchronizing database changes across many (10+) developers without a DBA? What I mean, basically, is that if someone wants to make a change to the database, what are some strategies to doing that? (i.e. I've created a 'Car' model and now I want to apply the appropriate DDL to the database, etc..)

We're primarily a Python shop and our ORM is SQLAlchemy. Previously, we had written our models in such a way to create the models using our ORM, but we recently ditched this because:

  • We couldn't track changes using the ORM
  • The state of the ORM wasn't in sync with the database (e.g. lots of differences primarily related to indexes and unique constraints)
  • There was no way to audit database changes unless the developer documented the database change via email to the team.

Our solution to this problem was to basically have a "gatekeeper" individual who checks every change into the database and applies all accepted database changes to an accepted_db_changes.sql file, whereby the developers who need to make any database changes put their requests into a proposed_db_changes.sql file. We check this file in, and, when it's updated, we all apply the change to our personal database on our development machine. We don't create indexes or constraints on the models, they are applied explicitly on the database.

I would like to know what are some strategies to maintain database schemas and if ours seems reasonable.

Thanks!

A: 

Have you tried the SQLalchemy Migrate tools?

They are specifically designed to auto-migrate your database design changes.

WoLpH
+2  A: 

The solution is rather administrative then technical :)

The general rule is easy, there should only be tree-like dependencies in the project: - There should always be a single master source of schema, stored together with the project source code in the version control - Everything affected by the change in the master source should be automatically re-generated every time the master source is updated, no manual intervention allowed never, if automatic generation does not work -- fix either master source or generator, don't manually update the source code - All re-generations should be performed by the same person who updated the master source and all changes including the master source change should be considered a single transaction (single source control commit, single build/deployment for every affected environment including DBs update)

Being enforced, this gives 100% reliable result.

There are essentially 3 possible choices of the master source 1) DB metadata, sources are generated after DB update by some tool connecting to the live DB 2) Source code, some tool is generating SQL scheme from the sources, annotated in a special way and then SQL is run on the DB 3) DDL, both SQL schema and source code are generated by some tool 4) some other description is used (say a text file read by a special Perl script generating both SQL schema and the source code)

1,2,3 are equally good, providing that the tool you need exists and is not over expensive 4 is a universal approach, but it should be applied from the very beginning of the project and has an overhead of couple thousands lines of code in a strange language to maintain

bobah
This is probably the most reasonable approach, especially for a start up. Thanks, good to know, at least, that we're on something similar to this path.
Mahmoud Abdelkader
+1  A: 

So am I correct in assuming you are designing your db directly on the physical db? I used to do this many years ago but the quality of the resultant db was pretty poor. If you use a modelling tool (personally I think Sybase pdesigner is still best of breed, but look around) everybody can make changes to the model and just sync their local db’s as required (it will also pick up documentation tasks). So, re bobah’s post, the master is the pdesigner model rather than a physical db.

Is your accepted_db_changes.sql file one humongous list of change scripts? I’m not sure I like the idea of changing the file name, etc. Given that the difference between to two db versions is a sequential list of alter scripts, how about a model like:

Ver1 (folder)
  Change 1-1.sql
  Change 1-2.sql
  Change 1-3.sql
Ver2 (folder)
  Change 2-1.sql
  Change 2-2.sql
  Change 2-3.sql

Where each change (new file) is reviewed before committing.

A general rule should be to make a conscious effort to automate as much of the db deployment in your dev environments as possible; we have defiantly got a respectable ROI on this work. You can use tools like redgate to generate your ddl (it has an api, not sure if it works with SQLAlchemy though). IMO, DB changes should be trivial, if you are finding they are blocking then look at what you can automate.

Vman
+1  A: 

You might find the book Refactoring Databases helpful as it contains general strategies for managing database, not just how to refractor them.

His system expects that every developer will have their own copy of the database as well as some general test database used before deploying to production. Your situation is one of the easier situations in the book describes as you don't have a number of separate applications using the database (although you do need someone who knows how to describe database migrations). The biggest thing is to be able to build the database from information in source control and have changes described by small migrations (see @WoLpH's answer) rather than just making the change in the database. Also you will find things easier if you have at least ORM <-> database tests to check that they are still in sync.

Kathy Van Stone
Interesting book, I will take a look. Thanks for the recommendation!
Mahmoud Abdelkader