ansaurus

Question

Answer 1

+5 A:

If you want to keep each match in a separate backreference, you have no choice but to "spell it out" - if you use repetition, you can either catch all six groups "as one" or only the last one, depending on where you put the capturing parentheses. So no, it's not possible to compact the regex and still keep all six individual matches.

A somewhat more efficient (though not beautiful) regex would be:

^Small\s+[0-9.]+\s+[0-9.]+\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)

since it matches the spaces explicitly. Your regex will result in a lot of backtracking. My regex matches in 28 steps, yours in 106.

Just as an aside: In Python, you could simply do a

>>> pieces = "Small   0.0..20.0   0.00    1.49    25.71   41.05   12.31   0.00    80.56".split()[-6:]
>>> print pieces
['1.49', '25.71', '41.05', '12.31', '0.00', '80.56']

Tim Pietzcker 2008-11-15 22:24:04

also, the .* in the original version can also match numbers, which can result in an invalid match. This one is better.

Wimmel 2008-11-15 22:35:53

Using \s instead of .*? is definitely a good idea. I just hate repeating ([0-9.]+) over and over but it might be unavoidable.

Mark Biek 2008-11-15 23:03:00

Answer 2

A:

For usability, you should use string substitution to build regex from composite parts.

$d = "[0-9.]+"; 
$s = ".*?"; 

$re = "^(Small)$s$d$s$d$s($d)$s($d)$s($d)$s($d)$s($d)$s($d)";

At least then you can see the structure past the pattern, and changing one part changes them all.

If you wanted to get really ANSI you could make a short use metasyntax and make it even easier to read:

$re = "^(Small)_#D_#D_(#D)_(#D)_(#D)_(#D)_(#D)_(#D)"; 
$re = str_replace('#D','[0-9.]+',$re); 
$re = str_replace('_', '.*?' , $re );

( This way it also makes it trivial to change the definition of what a space token is, or what a digit token is )

Kent Fredric 2008-11-15 22:45:23

Answer 3

+3 A:

Here is the shortest I could get:

^Small\s+(?:[\d.]+\s+){2}([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s*$

It must be long because each capture must be specified explicitly. No need to capture "Small", though. But it is better to be specific (\s instead of .) when you can, and to anchor on both ends.

PhiLho 2008-11-15 23:01:20

I think that answers my question if each capture has to be specified explicitly.

Mark Biek 2008-11-15 23:11:13

ansaurus

tags:

views:

answers:

How can I make this regex more compact?

related questions