views:

189

answers:

5

For a long time now, we've held our data within the project's repository. We just held everything under data/sql, and each table had its own create_tablename.sql and data_tablename.sql files.

We have now just deployed our 2nd project onto Scalr and we've realised it's a bit messy.

The way we deploy:

We have a "packageup" collection of scripts which tear apart the project into 3 archives (data, code, static files) which we then store in 3 separate buckets on S3.

Whenever a role starts up, it downloads one of the files (depending on the role: data, nfs or web) and then a "unpackage" script sets up everything for each role, loads the data into mysql, sets up the nfs, etc.

We do it like this because we don't want to save server images, we always start from vanilla instances onto which we install everything from scratch using various in-house built scripts. Startup time isn't an issue (we have a ready to use farm in 9 minutes).

The issue is that it's a pain trying to find the right version of the database whenever we try to setup a new development build (at any point in time, we've got about 4 dev builds for a project). Also, git is starting to choke once we go into production, as the sql files end up totalling around 500mb.

The question is:

How is everyone else managing databases? I've been looking for something that makes it easy to take data out of production into dev, and also migrating data from dev into production, but haven't stumbled upon anything.

A: 

Check out capistrano. It's a tool the ruby community uses for deployment to different enviroments and I find it really useful.

Also if your deployment is starting to choke try a tool twitter built called Murder.

Ken Struys
I'm not as worried about the deployment step itself, as I am about deployment combined with the production / development environments. We very often have to share database (structure + data) between us as well as with the live environment. Also, git is choking on our sql files.
Andrei Serdeliuc
+2  A: 

How I understand your main question is expirience of other people in migrating of SQL data from dev into production.

I use Microsoft SQL Server instead of My SQL, so I am not sure, that my expirience you can use directly. Nevertheless this way works very good.

I use Visual Studio 2010 Ultimate edition to compare data in two databases. The same feature exist also in Vinsual Studio Team Edition 2008 (or Database edition). You can read http://msdn.microsoft.com/en-us/library/dd193261.aspx to understand how it works. You can compare two databases (dev and prod) and generate SQL Script for modifying the data. You can easy exclude some tables or some columns from the comparing. You can also examine the results and exclude some entries from generation of the script. So one can easy and flexible generate scripts which can de used for deployment of the changes in the database. You can separetely compare the data of two databases from the sructure (schema compareing). So you can refresh data in dev with the data from prod or generate scripts which modify prod database to the last version of the dev database. I recommend you to look at this features and some products of http://www.red-gate.com/ (like http://www.red-gate.com/products/SQL_Compare/index.htm).

Oleg
+3  A: 

You should seriously take a look at dbdeploy (dbdeploy.com). It is ported to many languages, the major ones being Java and PHP. It is integrated in build-tools like Ant and Phing, and allows easy sharing of so called delta files.

A delta file always consists of a deploy section, but can also contain an undo section. When you commit your delta file and another developer checks it out, he can just run dbdeploy and all new changes are automatically applied to his database.

I'm using dbdeploy for my open source blog, so you can take a look on how delta files are organized: http://site.svn.dasprids.de/trunk/sql/deltas/

DASPRiD
there is a key point in here "undo". Any decent DB deploy process must have a one-step roll-back function otherwise you *will* get caught one day...
Jonathan Day
That looks quite good. Also I could break it down to two branches (dev, prod) and we can easily track dev changes separately from production changes, as dev changes are more likely to be reverted and don't always make it into prod. I wonder how well it would manage versions in that case.
Andrei Serdeliuc
Well, as a delta file usually consists of a specific feature instead of a version, you could just remove that delta file from version control and you should be fine then (surely after applying the undo part, not before).
DASPRiD
A: 

Personally i'd look at Toad

http://www.toadworld.com/

Less than 10k ;) ... will analyse database structures, produce scripts to modify them and also will migrate data.

John Nicholas
A: 

One part of the solution is to capture the version of each of your code modules and their corresponding data resources in a single location, and compare them to ensure consistency. For example, an increment in the version number of your, say, customer_comments module will require a corresponding SQL delta file to upgrade the relevant DB tables to the equal version number for the data.

For an example, have a look at Magento's core_resource approach as documented by @AlanStorm.

Cheers, JD

Jonathan Day