tags:

views:

50

answers:

1

Hi,

Given the following regular expressions:

 - alice@[a-z]+\.[a-z]+
 - [a-z]+@[a-z]+\.[a-z]+
 - .*

The string [email protected] will obviously match all three regular expressions. In the application I am developing, we are only interested in the 'most specific' match. In this case this is obviously the first one.
Unfortunately there seems no way to do this. We are using PCRE and I did not find a way to do this and a search on the Internet was also not fruitful.
A possible way would be to keep the regular expressions sorted on descending specificity and then simply take the first match. Of course then the next question would be how to sort the array of regular expressions. It is not an option to give the responsability to the end-user to ensure that the array is sorted. So I hope you guys could help me out here...

Thanks !!

Paul

+1  A: 

My gut instinct says that not only is this a hard problem, both in terms of computational cost and implementation difficulty, but it may be unsolvable in any realistic fashion. Consider the two following regular expressions to accept the string [email protected]

    alice@[a-z]+\.[a-z]+ 
    [a-z][email protected]

Which one of these is more specific?

torak
The one with more character constants? Or maybe you could automatically build a regular expression that was the intersection of both of them. That is, if RE (a) defines language L1 and RE (b) defines language L2, build a regular expression RE (a, b) which defines a language INTERSECTION(L1, L2).
Avi