tags:

views:

59

answers:

3

I have a file that contains hundreds of lines of the form

long long int          FILE_FORMAT_HEADER.file.index              1.4      3

I don't care about anything except those two numbers at the end: 1.4 and 3.

I'm using the following regular expression:

$line =~ m/.+\s+(\d+(\.\d+)?)\s+(\d+(\.\d+)?)/

The idea being to read as much of that string as possible, then store the first number into $1 and the second into $2. After that is run I expect $1 to contain 1.4 and $2 to contain 3, but I'm not having much success. I would guess that my regular expression is malformed. I've been staring at it and rewriting it for a while, but I would greatly appreciate an outside view.

+3  A: 
$line =~ m/(\d+(?:\.\d+)?)\s+(\d+(?:\.\d+)?)\s*$/

(?:) doesn't capture.

M42
This doesn't address the question of how the lines end. If there is possible whitespace and/or a newline after the numbers, the above will need adjustment.
swestrup
You're right but the op said "the two numbers at the end"
M42
Another option is to use named groups.
Alan Moore
+2  A: 

why do you think you need a regex?

while (<>){
 chomp;
 @F=split /\s+/, $_;
 # print last and last 2nd element.
}
ghostdog74
+4  A: 

It is capturing just fine, but you count match contexts from left to right at each (. Therefore, for your example:

 $1 is "1.4"
 $2 is ".4"
 $3 is "3"
 $4 is ""

You might want to anchor the pattern to the end of line with ...\s*$, but given your stated requirements a more specific match like you (properly) wrote is probably preferable to a space separated match. You should probably also yield a diagnostic message for a line that doesn't match if your expectation is that all lines match.

msw
Thank you for an explanation of what I was doing wrong instead of just tossing me a working regex!
Desert ed