For a long time now, we've held our data within the project's repository. We just held everything under data/sql, and each table had its own create_tablename.sql and data_tablename.sql files.
We have now just deployed our 2nd project onto Scalr and we've realised it's a bit messy.
The way we deploy:
We have a "packageup" collection of scripts which tear apart the project into 3 archives (data, code, static files) which we then store in 3 separate buckets on S3.
Whenever a role starts up, it downloads one of the files (depending on the role: data, nfs or web) and then a "unpackage" script sets up everything for each role, loads the data into mysql, sets up the nfs, etc.
We do it like this because we don't want to save server images, we always start from vanilla instances onto which we install everything from scratch using various in-house built scripts. Startup time isn't an issue (we have a ready to use farm in 9 minutes).
The issue is that it's a pain trying to find the right version of the database whenever we try to setup a new development build (at any point in time, we've got about 4 dev builds for a project). Also, git is starting to choke once we go into production, as the sql files end up totalling around 500mb.
The question is:
How is everyone else managing databases? I've been looking for something that makes it easy to take data out of production into dev, and also migrating data from dev into production, but haven't stumbled upon anything.