views:

1110

answers:

7

I'm aware of some of the test data generators out there, but most seem to just fill name and address style databases [feel free to correct me].

We have a large integrated and normalised application - e.g. invoices have part numbers linked to stocking tables, customer numbers linked to customer tables, change logs linked to audit information, etc which are obviously difficult to fill randomly. Currently we obfuscate real life data to get test data (but not very well).

What tools\methods do you use to create large volumes of data to test with?

+4  A: 

Where I work we use RedGate Data Generator to generate test data.

Since we work in the banking domain. When we have to work with nominative data (Credit card numbers, personnal ID, phone numbers) we developped an application who can mask these database fields so we can work with them as real data.

I can say with Redgate you can get close to what your real data can look like on a production server since you can customize every field of every table in your BD.

Pascal Paradis
A: 

Joel also mentioned RedGate in podcast #11

Eugene Katz
A: 

I did look at RedGate after Joel mentioned it.

My problem there is the requirements of Windows and SQL server - but I did ask a generic question.

ColinYounger
+1  A: 

The Red Gate product is good...but not perfect.

I found that I did better when I wrote my own tools to generate the data. I use it when I want to generate say Customers...but it's not great if you wanted to simulate randomness that customers might engage in like creating orders...some with one item some with multiple items.

Homegrown tools will provide the most 'realistic' data I think.

Webjedi
+3  A: 

You can generate data plans with VSTS Database Edition (with the latest 2008 Power tools).

It includes a Data Generation Wizard which allows automated data generation by pointing to an existing database so you get something that is realistic but contains entirely different data

vzczc
+2  A: 

I just completed a project creating 3,500,000+ health insurance claim lines. Due to HIPPA and PHI restrictions, using even scrubbed real data is a PITA. I used a tool called Datatect for this (http://www.datatect.com/).

Some of the things I like about this tool:

  1. Uses ODBC so you can generate data into any ODBC data source. I've used this for Oracle, SQL and MS Access databases, flat files, and Excel spreadsheets.
  2. Extensible via VBScript. You can write hooks at various parts of the data generation workflow to extend the abilities of the tool. I used this feature to "sync up" dependent columns in the database, and to control the frequency distribution of values to align with real world observed frequencies.
  3. Referentially aware. When populating foreign key columns, pulls valid keys from parent table.
Patrick Cuff
+2  A: 

I've rolled my own data generator that generates random data conforming to regular expressions. The basic idea is to use validation rules twice. First you use them to generate valid random data and then you use them to validate new input in production. I've stated a rewrite of the utility as it seems like a nice learning project. It's available at googlecode.

Goran