views:

94

answers:

1

Given a sub-string, is there a way to generate all the possible regular expressions (most restrictive to least restrictive) that would match that sub-string for a given string?

For example, say you have a sub-string "orange" and a string "apple banana orange grape". How would I get a list of regexes that match "orange" (I know there will be a lot; hoping there is some library to do this for me already).

+1  A: 

This is basically the same as asking "given some runtime requirements, is there a way to generate all the possible programs (most efficient to least efficient) that would match those requirements for a given input?" The answer is yes, there is a way to do it, but the number of the results is infinite, limited only by the bounds of reasonable memory and language implementation constraints, so you'll need to impose restrictions on what constitutes a valid "program" for your purposes, in order to cut it down to a finite set.

For instance, you can restrict yourself to a specific grammar, of sorts, representing a subset of the regular expression language in question, and only generate regexes matching that grammar:

Regex       ::= StartAnchor? Anything? (Substring | Anything) Anything? EndAnchor?

StartAnchor ::= "^"

Anything    ::= ".*"
              | "(.*)"

Substring   ::= "orange"
              | "(orange)"

EndAnchor   ::= "$"

Recursively take all paths of this grammar (i.e., each branch indicated by ? and |) to generate all of your target regular expressions. Of course, this answer deliberately says nothing about whether doing this is a good idea, or at all necessary...

Jon Purdy