tags:

views:

189

answers:

2

Hello,

How can I set which order to match things in a PCRE regular expression?

I have a dynamic regular expression that a user can supply that is used to extract two values from a string and stores them in two strings. However, there are cases where the two values can be in the string in reverse order, so the first (\w+) or whatever needs to be stored in the second string.

+3  A: 

you can extract the strings by name using

(?<name>\w+)

and get the values with

pcre_get_named_substring
eric espie
But how do you know which name to assign to which substring? This brings you no closer to solving the real problem of figuring out which group matched which substring.
Alan Moore
+1  A: 

If you're matching both parts with the same subpattern (like \w+), you're out of luck. But if the subpatterns are distinctively different you have a few options, none of them very pretty. Here's a regex that uses a conditional construct to match the src and type attributes of an HTML script element in either order:

\b(?(?=src=)
  src="([^"]*)"\s+type="([^"]*)"|
  type="([^"]*)"\s+src="([^"]*)"
)

(DISCLAIMER: This regex makes many unrealistic assumptions, chief among them that both attributes will be present and that they'll be adjacent to each other. I'm only using it to illustrate the technique.)

If the src attribute appears first, the src and type values will be captured in the first and second groups respectively. Otherwise, they'll appear in the fourth and third groups respectively. Named groups would make it easier to keep track of things, especially if could use the same name in more than place like you can in .NET regexes. Unfortunately, PCRE requires every named group to have a unique name, which is too bad; that's a very nice feature.

Alan Moore