Off the top of my head:
I'd work recursively (basically the opposite of a recursive decent parser) using some heuristics about what to do with ranges ((...)
: probably pick at random) optionals (?
: see [], below), repetitions('' Poisson distribution?). Literals ("..."
) are simple written to the output, and subtokens (`<...>') generate a recursion.
This shouldn't be too hard unless you want to guarantee some sort of complete coverage. Even then, just generating a bunch of data would be a help...
[*] You need to include optionals less than 50% of the time to prevent infinite regress when processing rules like
nonterm: otherstuff <nonterm>?
Good catch by plinth.
Likewise with repetitions, throw a distributions that converges strongly.
You'll need to parse the input grammar first if it is presented in a BNF form as here. Simplest thing to do would be use a mapping (name, string)
, then start with the highest level token (which you might assume means the first one...).
This gives you:
("program", "<imports> NEWLINE? <namespace>")
("imports", ("import" <identifier> NEWLINE)*)
...
The you start with "program", hit "<imports>" so you recur...on coming back, hist "NEWLINE?", so throw the dice and write or not, hit "<namespace>" so recur...on return you're done.
I find my self suspecting that this has been done before. If you just need the output, I'd search the web... Perhaps http://portal.acm.org/citation.cfm?doid=966137.966142, though the huge number of parser generators out there clutter up the search space... Try this paper, too.
BTW-- You local university probably has online subscriptions to these journals, so you can get them for free by hooking up at the library.