views:

64

answers:

3

I'm currently doing web development with another developer on a centralized development server. In the past this has worked alright, as we have two separate projects we are working on and rarely conflict. Now, however, we are adding a third (possible) developer into the mix. This is clearly going to create problems with other developers changes affecting my work and vice versa. To solve this problem, I'm thinking the best solution would be to create a virtual machine to distribute between the developers for local use. The problem I have is when it comes to the database.

Given that we all develop on laptops, simply keeping a local copy of the live data is plain stupid.

I've considered sanitizing the data, but I can't really figure out how to replace the real data, with data that would be representative of what people actually enter with out repeating the same information over and over again, e.g. everyone's address becomes 123 Testing Lane, Test Town, WA, 99999 or something. Is this really something to be concerned about? Are there tools to help with this sort of thing? I'm using MySQL. Ideally, if I sanitized the db it should be done from a script that I can run regularly. If I do this I'd also need a way to reduce the size of the db itself. (I figure I could select all the records created after x and whack them and all the records in corresponding tables out so that isn't really a big deal.)

The second solution I've thought of is to encrypt the hard drive of the vm, but I'm unsure of how practical this is in terms of speed and also in the event of a lost/stolen laptop. If I do this, should the vm hard drive file itself be encrypted or should it be encrypted in the vm? (I'm assuming the latter as it would be portable and doesn't require the devs to have any sort of encryption capability on their OS of choice.)

The third is to create a copy of the database for each developer on our development server that they are then responsible to keep the schema in sync with the canonical db by means of migration scripts or what have you. This solution seems to be the simplest but doesn't really scale as more developers are added.

How do you deal with this problem?

+2  A: 

Use fake data -- invest in a data generator if you must, but please don't use real data in a development environment, especially if it's possible that access to it may be compromised. I'm more familiar with tools for MS SQL, but googling for "MySQL data generator" brought up EMS SqlManager and Datanamic.

tvanfosson
I'm not familiar with the tools in particular, but +1 for "don't use real data in a development environment"! You may not be able to foresee how important this is, but *invest* the time into finding some way to generate fake data!
anonymous coward
I've heard this often in the past, and maybe I'm just dumb but I don't see the problem with using real data. Is it just because of the possibility of it getting stolen? I can see using real data as a problem if you worked for a software company. You wouldn't want to import a database from one of your customers. I can also see a problem if you want to keep your developers separated from the live server for some reason. Is there anything else? What am I missing here?
docgnome
@docgnome - does your company have a privacy policy? If so, what does it say about how your customer's data is used and protected? If the data is internal to the company, what do your HR policies say about such data. If you are using real data, at a minimum it has to have the same level of protection as the data in the production instance. Frankly, I find that too intrusive for a dev box. The risk of losing the data (and potentially your job if the data is sensitive in anyway -- and yes, phone numbers are sensitive), is not worth the small cost of generating fake data for development.
tvanfosson
A: 

As tvanfosson mentioned, use fake data instead of live. Doing so will not only keep the live data safe but also allow you to test different scenarios, such as international names and such.

As for how to distribute your DB, your schema and creation scripts really should be in source control, so each developer can create a local copy of the database as they see fit.

Paperjam
A: 

You could set up a fixtures (seed data) system. You provide the data once and it gets put into the db as many times as you need. That could be held in source control so that the fixtures are used/updated by all users.

I think that auto-generators are usually a bad idea. It is hard for them to generate information that could be real. Fixtures would allow you to make this information and know that it is what you are looking for. You could also push the bounds of your validators by using fixtures.

It may take a bit of time to set up the first time around, but I think you will get a much higher quality of data that is put in for testing.

Regards,

Justin

mediaslave
@mediaslave -- depends on what kind of testing you're talking about. For unit testing, the DB should be abstracted out and mock data should be used IMO. For integration/UI testing, you could largely use generated data using rules or regexes (whatever the tools allow). I'd argue that the DB shouldn't contain invalid data, i.e., data that should not be allowed to be inputted by the user. If you have some edge cases that are valid, but are not easy to generate, these can always be generated by hand. Once you have a good test data set, it should be backed up and maintained in version control.
tvanfosson