views:

151

answers:

2

I am writing a simulator, and would like to run studies by invoking a lot of instances of the simulator, using different sets of command-line arguments. I have read this question and several others, and they seem close, but I'm actually not looking for random data fulfilling a particular regex, I would like the set of all strings that match the regex. An example input file would look something like this:

myprogram.{version1|version2} -arg1 {1|2|4} {-arg2|}

or:

myprogram.{0} -arg1 {1} {2}
0: "version1" "version2"
1: "1" "2" "4"
2: "-arg2" ""

and would produce:

myprogram.version1 -arg1 1 -arg2
myprogram.version1 -arg1 1
myprogram.version1 -arg1 2 -arg2
myprogram.version1 -arg1 2
myprogram.version1 -arg1 4 -arg2
myprogram.version1 -arg1 4
myprogram.version2 -arg1 1 -arg2
myprogram.version2 -arg1 1
myprogram.version2 -arg1 2 -arg2
myprogram.version2 -arg1 2
myprogram.version2 -arg1 4 -arg2
myprogram.version2 -arg1 4

I would imagine something like this already exists, I just don't know the correct term to search for. Any help would be much appreciated. I can implement an abstract technique or algorithm myself if need be, but if it's a pre-existing tool I would prefer it to be free (at least as in beer) and run on Linux.

I know I am probably leaving some details out, and can be more specific about the appropriate things if necessary, rather than inundate people with a lot of detail up front. It is entirely possible that I am going about this the wrong way, and I am welcome to all solutions, even if they solve my problem in a different way.

Most importantly, this solution should not require me to write any extra parsing code if I want to add more argument options to the "cross-product" of strings I generate. I already have a Perl script that does this with a set of nested for loops over each "variable" that must change every time I change the number or nature of variables.

+3  A: 

As long as the braces are not nested, regular expressions will work fine. If you require nesting, you could add some extra recursion in the implementation language.

Here is an example in Python:

import re

def make_choices(template):
    pat = re.compile(r'(.*?)\{([^{}]+)\}',re.S)

    # tokenize the string
    last_end = 0
    choices = []
    for match in pat.finditer(template):
        prefix, alts = match.groups()
        if prefix:
            choices.append((prefix,)) # as a tuple
        choices.append(alts.split("|"))
        last_end = match.end()

    suffix = template[last_end:]
    if suffix:
        choices.append((suffix,))

    # recursive inner function
    def chooser(index):
        if index >= len(choices):
            yield []
        else:
            for alt in choices[index]:
                for result in chooser(index+1):
                    result.insert(0,alt)
                    yield result

    for result in chooser(0):
        yield ''.join(result)

Example:

>>> for result in make_choices('myprogram.{version1|version2} -arg1 {1|2|4} {-arg2|}'):
...     print result
...
myprogram.version1 -arg1 1 -arg2
myprogram.version1 -arg1 1
myprogram.version1 -arg1 2 -arg2
myprogram.version1 -arg1 2
myprogram.version1 -arg1 4 -arg2
myprogram.version1 -arg1 4
myprogram.version2 -arg1 1 -arg2
myprogram.version2 -arg1 1
myprogram.version2 -arg1 2 -arg2
myprogram.version2 -arg1 2
myprogram.version2 -arg1 4 -arg2
myprogram.version2 -arg1 4

You could use os.system() to execute the commands from within Python:

#!/etc/env python
import sys, os

template = ' '.join(sys.args)
failed = 0
total = 0
for command in make_choices(template):
    print command
    if os.system(command):
        print 'FAILED'
        failed += 1
    else:
        print 'OK'
    total += 1

print
print '%d of %d failed.' % (failed,total)

sys.exit(failed > 0)

And then on the command line:

user:/home/> template.py 'program.{version1|version2}'
program.version1
OK
program.version2
FAILED

1 of 2 failed.
MizardX
Thanks so much! This works great for my current use case, and when the time comes to add recursion to the syntax, I'll stop putting off learning Python and use it as my 'hello world' (unless you're bored ;)
Matt J
+2  A: 

You're not really looking for something that regular expressions were designed for. You're just looking for a tool that generates combinations of discrete options.

Depending on the set of all possible arguments, an exhaustive list of combinations may not actually be necessary. Regardless, you should look into Pairwise Testing. I know for a fact that the PICT tool can generate you either the exhaustive list or the pairwise list of test cases you desire.

Lee
Agreed; the functionality I am looking for is quite different than the implementation of regexes in any language I know of; as I tried to get across in the title, the pleasing syntax for combining literals and variables is what I'm after.As far as your second point, my idea was that I would write a separate input file for each set of jobs I'd like to do, in such a way that the exhaustive list is indeed what I want. If there is a scheme which allows me to specify the space of possible arguments to my program, and simply pick out a subset of those in each input file, that's fine too.
Matt J
It sounds like the PICT tool will be able to handle what you want. You specify the space of possible arguments and then it can generate the exhaustive list or the pairwise, ternary-wise or other n-order short of exhaustive. But, depending on the size of your problem space ... you may want to look at the research on the site, exhaustive may not be necessary. Of course, if you're talking less than a hundred test cases then who cares if it isn't necessary.
Lee
Yeah, PICT looks like it would do what I want, though it may be a little overkill for my purposes. This is for running a carefully-selected set of directed simulations, not for testing, so I really just need a shorthand for generating lots of strings with common structure; you can always write your input in such a way that the cartesian product turns out to be exactly the set of things you wanted (though it may be convoluted to do so if you are exploring a non-linear, for lack of a better term, piece of the parameter space).
Matt J