ansaurus

Question

Composed Regular Expressions - breaking a regex down into a readable form

Answer 1

+2 A:

Yes, absolutely. Regexes are powerful, but because of their terse syntax, extremely unreadable. When I read a comment such as "this matches an URI", that doesn't actually help me figure out how it does that, and where I should look to (for example) fix a bug where it doesn't match some obscure corner case in query string properly. Regex is code; document it as you'd document a function. If it's short and (reasonably) clear, a single comment for the entire regex is fine. If it's complicated, clearly highlight and comment individual parts. If it's really complex, split it into several regexes.

Pavel Minaev 2009-07-24 23:43:01

Answer 2

A:

It is fairly easy to read if you can have extended syntax.

/^
  score   \s+ (\d+) \s+
  for     \s+ (\d+) \s+
  nights? \s+  at   \s+ (.*)
/x

I personally prefer Perl 6 style regex. I think they're easier to read.

rule pattern{
  score        $<score>= [ <.digits>+ ]
  for          $<nights>=[ <.digits>+ ]
  night[s]? at $<hotel>= [ .+ ]
}

After you perform a match against that rule $/ is associated with the matched text.

So something like this:

say "Hotel $/<hotel>";
say $/.perl;

Would output something like this

Hotel name of hotel
{
  'hotel'  => 'name of hotel',
  'nights' => 5,
  'score'  => 8
}

Brad Gilbert 2009-07-25 01:05:48

Answer 3

A:

I deal with this in PHP by using associative arrays and PHP's version of the tr function (I assume a similar data structure and function exists in any language).

The array looks like this:

$mappings = array ( 
  'a' => '[a-z0-9]',
  'd' => '[0-9]', 
  's' => '\s+', //and so on 
);

Then when I put them to use, it's just a matter of merging with the tr function. Mapped stuff gets converted, and unmapped stuff falls through:

 $regexp = strtr( $simplified_string, $mappings) ;

Bear in mind that this approach can just as easily overcomplicate things as it can simplify them. You're still writing out patterns, it's just that you've abstracted one pattern into another. Nevertheless, having these poor-man's character classes can be useful in outsourcing regexp's to devs or spec providers that don't speak the language.

rooskie 2009-07-25 01:28:29

ansaurus

tags:

views:

answers:

Composed Regular Expressions - breaking a regex down into a readable form

related questions