views:

454

answers:

3

So i have a regex pattern, and I want to generate all the text permutations that would be allowed from that pattern.

Example:

var pattern = "^My (?:biological|real)? Name is Steve$";
var permutations = getStringPermutations(pattern);

This would return the list of strings below:

My Name is Steve

My real Name is Steve

My biological Name is Steve

Update: Obviously a regex has an infinate number of matches, so i only want to generate off of optional string literals as in the (?:biological|real)? from my example above. Something like (.)* has too many matches, so I will not be generating them off of that.

A: 

One method that might be a bit weird would be to put the possible choices into an array first, and then generate the regex based on the array and then use the same array to generate the permutations.

Anthony
+1  A: 

If you restrict yourself to the subset of regular expressions that are anchored at both ends, and involve only literal text, single-character wildcards, and alternation, the matching strings should be pretty easy to enumerate. I'd probably rewrite the regex as a BNF grammar and use that to generate an exhaustive list of matching strings. For your example:

<lang>   -> <begin> <middle> <end>
<begin>  -> "My "
<middle> -> "" | "real" | "biological"
<end>    -> " name is Steve"

Start with the productions that have only terminal symbols on the RHS, and enumerate all the possible values that the nonterminal on the LHS could take. Then work your way up to the productions with nonterminals on the RHS. For concatenation of nonterminal symbols, form the Cartesian product of the sets represented by each RHS nonterminal. For alternation, take the union of the sets represented by each option. Continue until you've worked your way up to <lang>, then you're done.

However, once you include the '*' or '+' operators, you have to contend with infinite numbers of matching strings. And if you also want to handle advanced features like backreferences...you're probably well on your way to something that's isomorphic to the Halting Problem!

Jim Lewis
A: 

Here's a sketch of a function I wrote to take a List of Strings and return a list of all the permutated possibilities: (taking on char from each)

public static List<string> Calculate(List<string> strings) {
            List<string> returnValue = new List<string>();
            int[] numbers = new int[strings.Count];
            for (int x = 0; x < strings.Count; x++) {
                numbers[x] = 0;
            }
            while (true) {
                StringBuilder value = new StringBuilder();
                for (int x = 0; x < strings.Count; x++) {
                    value.Append(strings[x][numbers[x]]);
                    //int absd = numbers[x];
                }
                returnValue.Add(value.ToString());
                numbers[0]++;
                for (int x = 0; x < strings.Count-1; x++) {
                    if (numbers[x] == strings[x].Length) {
                        numbers[x] = 0;
                        numbers[x + 1] += 1;
                    }
                }
                if (numbers[strings.Count-1] == strings[strings.Count-1].Length)
                    break;

            }
            return returnValue;
        }
CrazyJugglerDrummer