views:

89

answers:

2

I'm trying to create a system where I can convert RegEx values to integers and vice versa. where zero would be the most basic regex ( probably "/./" ), and any subsequent numbers would be more complex regex's

My best approach so far was to stick all the possible values that could be contained within a regex into an array:

values = [ "!", ".", "\/", "[", "]", "(", ")", "a", "b", "-", "0", "9", .... ]

and then to take from that array as follows:

def get( integer ) 
  if( integer.zero? ) 
    return '';
  end

  integer = integer - 1;

  if( integer < values.length )
    return values[integer]
  end

  get(( integer / values.length ).floor) + get( integer % values.length);
end

sample_regex = /#{get( 100 )}/;

The biggest problem with this approach is that a invalid RegExp can easily be generated.

Is there an already established algorithm to achieve what I'm trying? if not, any suggestions?

Thanx
Steve

+4  A: 

Since regular expressions can be formally defined by recursively applying a finite number of elements, this can be done: instead of simply concatenating elements, combine them according to the rules of regular expressions. Because the regular language is also recursively enumerable, this is guaranteed to work.

However, it's quite probably overkill to implement this. What do you need this for? Would a simple dictionary of Number -> RegExp key-value pairs not be better suited to associate regular expressions with unique numbers?

Konrad Rudolph
How would I "combine them with the rules of regular expresions."?a Regex dictionary would not suite my purpose, as I need the infinite range of regex's. Starting at the least complex form and growing more and more complex towards infinity.
NixNinja
+1  A: 

I would say that // is the simplest regex (it matches anything). /./ is fairly complex since it is just shorthand for /[^\n]/, which itself is just shorthand for a much longer expression (what that expression is depends on your character set). The next simplest expression would be /a/ where a is the first character in your character set. That last statement brings up an interesting problem for your enumeration: what character set will you use? Any enumeration will be tied to a given character set. Assuming you start with // as 0, /\x{00}/ (match the nul character) as 1, /\x{01}/ as 2, etc. Then you would start to get into interesting regexes (ones that match more than one string) around 129 if you used the ASCII set, but it would take up to 1114112 for UNICODE 5.0.

All in all, I would say a better solution is treat the number as a sequence of bytes, map those bytes into whatever character set you are using, use a regex compiler to determine if that number is a valid regex, and discard numbers that are not valid.

Chas. Owens