tags:

views:

155

answers:

7

Hey there!

Given a regular expression, how can I list all possible matches? For example: AB[CD]1234, I want it to return a list like: ABC1234 ABD1234

I searched the web, but couldn't find anything.

+3  A: 

Impossible.

Really.

Consider look ahead assertions. And what about .*, how will you generate all possible strings that match that regex?

Bart Kiers
+2  A: 

A regular expression is intended to do nothing more than match to a pattern, that being said, the regular expression will never 'list' anything, only match. If you want to get a list of all matches I believe you will need to do it on your own.

Scott Lance
A list of all possible matches is what you mean.
Ikke
+1  A: 

It may be possible to find some code to list all possible matches for something as simple as you are doing. But most regular expressions you would not even want to attempt listing all possible matches.

For example AB.*1234 would be AB followed by absolutely anything and then 1234.

Kenny Drobnack
A: 

I'm not entirely sure this is even possible, but if it were, it would be so cpu/time intensive for many situations that it would not be useful.

For instance, try to make a list of all matches for A.*Z

There are sites that help with building a good regular expression though:

chills42
+5  A: 

The reason you haven't found anything is probably because this is a problem of serious complexity given the amount of combinations certain expressions would allow. Some regular expressions could even allow infite matches:

Consider following expressions:

AB[A-Z0-9]{1,10}1234

AB.*1234

I think your best bet would be to create an algorithm yourself based on a small subset of allowed patterns. In your specific case, I would suggest to use a more naive approach than a regular expression.

Yannick M.
A: 

Well you could convert the regular expression into an equivalent finite state machine (is relatively simple and can be done algorithmly) and then recursively folow every possible path through that fsm, outputting the followed paths through the machine. It's neither very hard nor computer intensive per output (you will normally get a HUGE amount of output however). You should however take care to disallow potentielly infinite passes (like .*). This can be done by having a maximum allowed path length, after which the tracing is aborted

Grizzly
+1  A: 

For some simple regular expressions like the one you provided (AB[CD]1234), there is a limited set of matches. But for other expressions (AB[CD]*1234) the number of possible matches are not limited.

One method for locating all the posibilities, is to detect where in the regular expression there are choices. For each possible choice generate a new regular expression based on the original regular expression and the current choice. This new regular expression is now a bit simpler than the original one.

For an expression like "A[BC][DE]F", the method will proceed as follows

getAllMatches("A[BC][DE]F")
= getAllMatches("AB[DE]F") + getAllMatches("AC[DE]F")
= getAllMatches("ABDF") + getAllMatches("ABEF") 
   + getAllMatches("ACDF")+ getAllMatches("ACEF")
= "ABDF" + "ABEF" + "ACDF" + "ACEF"
midtiby