




I have a file that contains hundreds of lines of the form

long long int          FILE_FORMAT_HEADER.file.index              1.4      3

I don't care about anything except those two numbers at the end: 1.4 and 3.

I'm using the following regular expression:

$line =~ m/.+\s+(\d+(\.\d+)?)\s+(\d+(\.\d+)?)/

The idea being to read as much of that string as possible, then store the first number into $1 and the second into $2. After that is run I expect $1 to contain 1.4 and $2 to contain 3, but I'm not having much success. I would guess that my regular expression is malformed. I've been staring at it and rewriting it for a while, but I would greatly appreciate an outside view.

+3  A: 
$line =~ m/(\d+(?:\.\d+)?)\s+(\d+(?:\.\d+)?)\s*$/

(?:) doesn't capture.

This doesn't address the question of how the lines end. If there is possible whitespace and/or a newline after the numbers, the above will need adjustment.
You're right but the op said "the two numbers at the end"
Another option is to use named groups.
Alan Moore
+2  A: 

why do you think you need a regex?

while (<>){
 @F=split /\s+/, $_;
 # print last and last 2nd element.
+4  A: 

It is capturing just fine, but you count match contexts from left to right at each (. Therefore, for your example:

 $1 is "1.4"
 $2 is ".4"
 $3 is "3"
 $4 is ""

You might want to anchor the pattern to the end of line with ...\s*$, but given your stated requirements a more specific match like you (properly) wrote is probably preferable to a space separated match. You should probably also yield a diagnostic message for a line that doesn't match if your expectation is that all lines match.

Thank you for an explanation of what I was doing wrong instead of just tossing me a working regex!
Desert ed