views:

2707

answers:

11

I'm looking for recommendations of a good, free tool for generating sample data for the purpose of loading into test databases. By analogy, something that produces "lorem ipsum" text for any RDBMS. Features I'm looking for include:

  • Flexibility to generate data for an existing table definition.
  • Ability to generate small and large data sets (> 1 million rows or more).
  • Generate in SQL script format (INSERT statements) or else in a flat file format suitable for bulk import (which is usually faster).
  • A command-line interface for easy scripting.
  • Extensible, open source, written in a dynamic language (these are nice-to-haves, not strong requirements).

PS: I did search for a duplicate question on StackOverflow, but I didn't find one. If there is one, I'll be grateful to get a pointer to it.


Thanks for the great responses everyone! I should amend my requirements that I use Mac OS X as my primary development environment, not Windows (though I did say command-line interface is desirable, and that practically rules out Windows). The Windows-specific suggestions will no doubt be useful to other readers of this question, though, so thanks.


Here is my conclusion:

  • GenerateData:

    • PHP web app interface, not command line
    • limited to generating 200 records (or pay $20 for license to generating 5,000 records)
  • RedGate SQL Data Generator

    • not free, price $295
    • requires Windows, .NET, SQL Server
  • Visual Studio 2008 Database Edition

    • requires Windows
    • requires costly MSDN or ISV subscription
  • Banner Datadect

    • not free, price $595
    • requires Windows (?)
    • no support for MySQL (?)
    • GUI, not command line or scriptable
  • Ruby Faker gem

    • way too slow to use ActiveRecord for bulk data load
  • Super Smack

    • chiefly a load-testing tool, with a random data generator built in
    • pretty simple to use nevertheless
    • overall a good runner-up tool
  • Databene Benerator

    • best solution for my needs
    • XML scripts, compatible with DbUnit
    • open source (GPL) Java code
    • command-line usage
    • access many databases directly via JDBC
+8  A: 

This looks quite promising: generatedata.com. Open-source, has lots of built-in data types.

There are several others listed here: Test (Sample) Data Generators. I don't have experience with any of them, but a few on that list look like they could be pretty decent.

Chad Birch
Thanks for the links! +1
Bill Karwin
Good resource for first and last names. Thanks
MicTech
A: 

I know you're not looking for actual lorem ipsum text; but in case anyone else searches for an actual lorem ipsum generator and finds this thread: lipsum.com does a great job of it.

Jenn D.
Thanks for the link, but yeah that's not what I was looking for.
Bill Karwin
There's also a plugin for Firefox called Dummy Lipsum, it's useful! Sorry I can't help Bill :(
alex
+2  A: 

If you are looking or willing to use something MySQL-specific, you could take a look at Super Smack. It is currently maintained by Tony Bourke.

Super Smack allows you to generate random data to insert into your database tables. It is customizable, allowing you to use the packaged words.dat file, or any test data of your choice.

One of the nice things about it is that it is command-line is highly customizable. There is some fairly decent examples of usage in the book High Performance MySQL which is also excerpted here.

Not sure if that is along the lines of what you are looking for, but just a thought.

jonstjohn
Looks promising! Says it supports PostgreSQL as well as MySQL. Thanks for the link.
Bill Karwin
+1  A: 

A Ruby script with one of the available fake data generators should do you just fine.

http://faker.rubyforge.org/ is one such gem. Unfortunately, this doesn't fulfill all your requirements.

Here is another: http://random-data.rubyforge.org/

And a tutorial for using Faker: http://www.rubyandhow.com/how-to-generate-fake-names-addresses-in-ruby/


RE: Flexibility to generate data for an existing table definition. Combine the Faker gem with one of the available ORMs. ActiveRecord would probably be easiest.

brendanjerwin
Have you tried to do a bulk load of > 1 million rows, one row at a time through an ActiveRecord interface? I am not optimistic about time to completion.
Bill Karwin
Also, I used the Faker gem today in some Cucumber Feature steps and its S L O W. So, my score so far: ActiveRecord -1; Faker -1I'm not doing so great. :)
brendanjerwin
A: 

Not direct answer to your question but this can be helpful for certain kind of data :

Fake Name Generator can be useful - http://www.fakenamegenerator.com/ , not for everything but user accounts or stuff like that. AFAIK They provide support for bulk order.

dr. evil
Yeah I took a look but it doesn't seem to offer the flexibility I'm looking for. Thanks anyway for the link.
Bill Karwin
+1  A: 

Not free, but Visual Studio 2008 Database Edition is a good alternative and it provides a lot more functionality (Integration with SCC, Unit Testing, DB Refactoring, etc...)

Seems to be available only through an MSDN subscription that costs $5469 per year. For that amount of money, I could hire some college students to make up test data and type it in.
Bill Karwin
+4  A: 

I know you said you were looking for a free tool, but this is one case where I would suggest that spending $295 will pay you back quickly in time saved. I've been using the RedGate tool SQL Data Generator for the last year and it is, to be short, an awesome tool. It allows for setting dependencies between columns, generates realistic data for business objects such as phone numbers, urls, names, etc. I can honestly state that this tool has paid for itself time and time again.

KevDog
Yup, I am not averse to spending $295 to save many hundreds in development time. Thanks for the lead!
Bill Karwin
+7  A: 

Take a look at databene benerator, a test data generator that looks close to your requirements.

  • it can generate data for an existing table definition (or even anonymize production data)
  • it can generate larges data set (unlimited size)
  • it supports various input (CSV, Flat Files, DBUnit) and output format (CSV, Flat Files, DBUnit, XML, Excel, Scripts)
  • it can be used on the command line or through a maven plugin
  • it's open source and customizable

I would give it a try.

BTW, a list of similar products is available on databene benerator's web site.

Pascal Thivent
@Pascal Thivent Interesting (+1) I will take a look.
Arthur Ronald F D Garcia
+2  A: 

Normally very costly, but if you are a small ISV you can get Visual Studio 2008 Database Edition very cheaply, see the empower and bizspark promotions. It provides a lot more functionality then just generating test data (Integration with SCC, Unit Testing, DB Refactoring, etc.)

As I like the fact that Red-Grate tools are so easy to learn, I would still look at SQL Data Generator

Ian Ringrose
Yeah it's less costly, on the order of the same price as RedGate's tool, but in addition you have to qualify as an ISV and that means buying other stuff. Thanks for the link anyway, no doubt it'll be useful for someone. +1
Bill Karwin
+1  A: 

I use a tool called Datatect:

  1. Generates data to flat files or any ODBC compliant database.
  2. Extensible via VBScript.
  3. Referentially aware; will populate foreign keys with values from parent table.
  4. Data is context aware; city, state and phone numbers for given zip codes, first names and titles with gender.
  5. Can create custom, complex data types.
  6. Generate over 2 billion proper names, business names, street addresses, cities, states, and zip codes.

I've used this tool to generate as many as 40,000,000 rows of data to a SQLServer database, and 8,000,000 rows of data to an Oracle database.

I am in no way affiliated with Banner Systems, just a satisfied customer.

Patrick Cuff
That looks like a promising option. Thanks for the link. +1 However, I don't develop on Windows as my primary platform, sorry I didn't specify that in my question.
Bill Karwin
A: 

+1 for Benerator: I tried 3 or 4 of the other tools on offer (including dbmonster) but found Benerator to be very quick, to deliver realistic data and to be flexible. I also got very quick & helpful feedback from the tool's creator when I posted on the forum.

davek