views:

102

answers:

3

Hi guys,

There is a massive database (GB) that I am working with now and all of the previous development has been done on a slicehost slice. I am trying to get ready for more developers to come in and work so I need each person to be able to setup his own machine for development, which means potentially copying this database. Selecting only the first X rows in each table to cut size could be problematic for data consistency. Is there any way around this, or is a 1 hour download for each developer going to be necessary? And beyond that, what if I need to copy the production DB down for dev purposes in the future?

Sincerely, Tyler

A: 

To make downloading the production database more efficient, be sure you're compressing it as much as possible before transmission, and further, that you're stripping out any records that aren't relevant for development work.

You can also create a patch against an older version of your database dump to ship over only the differences and not an entirely new copy of it. This works best when each INSERT statement is recorded one per line, something that may need to be engaged on your tool specifically. With MySQL this is the --skip-extended-insert option.

A better approach is to have a fake data generator that can roll out a suitably robust version of the database for testing and development. This is not too hard to do with things like Factory Girl which can automate routine record creation.

tadman
Hmm.. would you put the fake data generator in a rake task that you run before starting development? Why would this be better than using a production DB? How fast could one write a script that would populate all 300 tables, some of which have 100+ columns?
tesmar
Hmm, I think you are talking about seed data, right? In this case when there is so much seed data, is is better to use a seed generator like seed_fu or to just dump the DB?
tesmar
A: 

Why not have a dev server that each dev connects to?

Yes all devs develop against the same database. No developement is ever done excpt through scripts that are checked into Subversion. If a couple of people making changes run into each other, all the better that they find out as soon as possible that they are doing things which might conflict.

We also periodically load a prod backup to dev and rerun any scripts for things which have not yet been loaded to prod to keep out data up-to-date. Developing against the full data set is critical once you have a medium sized database because the coding techniques which appear to be fine to a dev on a box by himself with a smaller dataset, will often fail misreably against prod sized data and when there are multiple users.

HLGEM
Hmm.. could you elaborate on that some more? Would each dev have access to the one database? I guess that would mean they would have to be more careful, and couldn't make big DB changes?
tesmar
nothing wrong with being more careful
HLGEM
+1  A: 

databases required for development and testing rarely need to be full size, it is often easier to work on a small copy. A database subsetting tool like Jailer ( http://jailer.sourceforge.net/ ) might help you here.

Georg