tags:

views:

560

answers:

5

Yup, you read that right. I need a library that is capable of generating random text from a regular expression. So the text should be random, but be matched by the regular expression. It seems it doesn't exist, but I could be wrong.

Just a an example: that library would be capable of taking '[ab]*c' as input, and generate samples such as:

abc
abbbc
bac

etc.

Update: I created something myself: Xeger. Check out http://code.google.com/p/xeger/.

+3  A: 

Same question here: http://stackoverflow.com/questions/274011/random-text-generator-based-on-regex

I haven't tried it. Good question!

Sinuhe
Trying to see if I can use Ruby Randexp running using JRuby, and get some support for it in Java that way.
Wilfred Springer
Keep in mind that Java 7 will be able to execute Ruby natively.
Sinuhe
+5  A: 

I am not aware of such a library. If you're interested in writing one yourself, then these are probably the steps you'll need to take:

  1. Write a parser for regular expressions (you may want to start out with a restricted class of regexes).

  2. Use the result to construct an NFA.

  3. (Optional) Convert the NFA to a DFA.

  4. Randomly traverse the resulting automaton from the start state to any accepting state, while storing the characters outputted by every transition.

The result is a word which is accepted by the original regex. For more, see e.g. Converting a Regular Expression into a Deterministic Finite Automaton.

Stephan202
I have been looking for a library that would create an NFA from regex in Java. I know the above would work, since I used to do that in Javascript ages ago.
Wilfred Springer
I guess this would be worth to take a look at: http://www.brics.dk/~amoeller/automaton/
Wilfred Springer
I implemented Xeger based on the library I mention above.
Wilfred Springer
A: 

Here's a few implementations of such a beast, but none of them in Java (and all but the closed-source Microsoft one very limited in their regexp feature support).

Michael Borgwardt
A: 

I just created a library for doing this a minute ago. It's hosted here: http://code.google.com/p/xeger/. Carefully read the instructions before using it. (Especially the one referring to downloading another required library.) ;-)

This is the way you use it:

String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);
Wilfred Springer
A: 

Here is a Python implementation of a module like that: http://www.mail-archive.com/[email protected]/msg125198.html It should be portable to Java.

Björn Lindqvist