tags:

views:

2784

answers:

8

I am writing a Java utility which helps me to generate loads of data for performance testing. It would be really cool to be able to specify a regex for Strings so that my generator spits out things which match this. Is there something out there already baked which I can use to do this? Or is there a library which gets me most of the way there?

Thanks

+1  A: 

You'll have to write your own parser, like the author of String::Random (Perl) did. In fact, he doesn't use regexes anywhere in that module, it's just what perl-coders are used to.

On the other hand, maybe you can have a look at the source, to get some pointers.


EDIT: Damn, blair beat me to the punch by 15 seconds.

Espo
+5  A: 

Edit:

As mentioned in the comments, there is a library available at Google Code to acheive this: http://code.google.com/p/xeger

Original message:

Firstly, with a complex enough regexp, i beleive this can be impossible. But you should be able to put something together for simple regexps.

If you take a look at the source code of the class java.util.regex.Pattern, you'll see that it uses an internal representation of Node instances. Each of the different pattern components have their own implementation of a Node subclass. These Nodes are organised into a tree.

By producing a visitor that traverses this tree, you should be able to call an overloaded generator method or some kind of Builder that cobbles something together.

Cheekysoft
Actually there is a Java library: http://code.google.com/p/xeger/
Joseph Kern
+2  A: 

Visual Studio Team System does include something like this http://msdn.microsoft.com/en-us/library/aa833197(VS.80).aspx

Not much help for Java though, so sorry.

samjudson
+1  A: 

On stackoverflow podcast 11:

Spolsky: Yep. There's a new product also, if you don't want to use the Team System there our friends at Redgate have a product called SQL Data Generator [http://www.red-gate.com/products/sql_data_generator/index.htm]. It's $295, and it just generates some realistic test data. And it does things like actually generate real cities in the city column that actually exist, and then when it generates those it'll get the state right, instead of getting the state wrong, or putting states into German cities and stuff like... you know, it generates pretty realistic looking data. I'm not really sure what all the features are.

This is probably not what you are looking for, but it might be a good starting off point, instead of creating your own.

I can't seem to find anything in google, so I would suggest tackling the problem by parsing a given regular expression into the smallest units of work (\w, [x-x], \d, etc) and writing some basic methods to support those regular expression phrases.

So for \w you would have a method getRandomLetter() which returns any random letter, and you would also have getRandomLetter(char startLetter, char endLetter) which gives you a random letter between the two values.

Craig
A: 

I know there's already an accepted answer, but I've been using RedGate's Data Generator (the one mentioned in Craig's answer) and it works REALLY well for everything I've thrown at it. It's quick and that leaves me wanting to use the same regex to generate the real data for things like registration codes that this thing spits out.

It takes a regex like:

[A-Z0-9]{3,3}-[A-Z0-9]{3,3}

and it generates tons of unique codes like:

LLK-32U

Is this some big secret algorithm that RedGate figured out and we're all out of luck or is it something that us mere mortals actually could do?

J Wynia
They have 100,000 monkeys, 100,000 typewriters and a web service.
bzlm
+1  A: 
+2  A: 

I've gone the root of rolling my own :)

Goran
+2  A: 

Xeger (Java) is capable of doing it as well:

String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);
Wilfred Springer