tags:

views:

122

answers:

6

Got a text file that looks like:

    200.0     250.0     300.0     350.0     400.0  
162:02:10 017:01:56 017:08:18 011:16:22 008:40:18
    580.0     600.0     620.0     640.0     660.0   
004:04:01 001:47:27 007:25:29 017:44:53 003:07:34

Trying to parse out lines 1 & 3 as "values", and lines 2 & 4 as "times".

My code:

if($line =~ /^\d[^:]*\d/){
     my @values = split(/\s/,$line);
        }
elsif($line =~/^\d+:\d+:\d+/){
     my @time = split(/\s/,$line);
}

Problem: Always matches first regex. My understanding of regex #1 is it will match a line that starts with a digit, followed by any value that is not a ':' any number of times, followed by another digit.

A: 

Lines 1 and 3 satisfy the following regex:

(?m)^(?:\s*\d+\.\d+\s*)+$

Try this:

open(FILE, 'yourfile.txt') or die("Could not open file!");
foreach $line (<FILE>) {
  if($line =~ /(?m)^(?:\s*\d+\.\d+\s*)+$/) {
    print $line; 
  }
}
Bart Kiers
A: 

Simply change the order of checks:

if($line =~/^\d+:\d+:\d+/){
     ...
        }
elsif($line =~ /^\d[^:]*\d/){
    ... 
}
kgiannakakis
Awesome, can't believe I didn't see that. Still doesn't make sense to me why regex #1 matches when a ':' is present.
Gnatz
@Isaacs Because the patten is only anchored to the start of the string. (the '^') Patten one in English reads; start of line, followed by a digit, followed by 0 or more non : characters (which includes digits...) followed by a single digit, that's the end of the patten so anything could be on the rest of the line, including ':'! You would be better off with /\d+\.-d/ (one or more digits followed by a literal dot, followed by a digit)
Chris Huang-Leaver
A: 
if($line =~/\^d+:\d+:\d+/){
  my @values = split /\s+/, $line;
}else{
  my @time = split /\s+/, $line;
}
Marius
+5  A: 

It happens because lines 2 and 4 really contain the first regex's pattern.

Maybe you can simply check, whether a line has a colon sign, and it will be sufficient? Like this:

my @time;
my @values;
if($line =~ /:/){
     @time = split(/\s+/,$line);
}
else{
     @values = split(/\s+/,$line);
}
Igor Oks
+1 for a simple solution
Ashwin
The "my" in front of @time and @values will surely diminish the usefulness of all this.
innaM
+1 for the simple answer. Your regex needs a quantifier: `/\s+/`. Also, yet another way to determine line type is to avoid regex and use the faster `index` function.
FM
@Manni, @FM: Thanks, updated.
Igor Oks
+2  A: 

The reason your first regex matches every time is that it is detecting any string that starts with a digit can have any number of characters (other than :) and then has another digit. This means line 2 will match on the first three characters before the colon.

You may wish to match on the end of the line as well, or do something more simple like just match against the colon.

Barns
Also, if there are genuinely spaces in your first and third lines (at the beginning, you may need to consider how you are matching on those.)
Barns
ahhh.... thanks
Gnatz
A: 

The other answers have all focused on regexes. But there is another way to tell where you are in a file.

If you are certain that the lines always alternate, and will be in the same order you can use $. to get the line number you are processing.

This only works if values are always on odd lines, and times are always on even lines.

my @times_and_values;
my $values; 
while(  my $line = <DATA> ) {

    if( $. % 2 ) {
        $values = parse_values($line);
    }
    else {
        my $times = parse_times($line);

        push @times_and_values, [$times, $values]
            if defined $values and defined $times;

    }

}

Your parsing functions can then handle validation and decomposition of the lines. Use regexes tailored to each to reject incorrect values and do any parsing. You can either throw a fatal error or warn. The above code will skip time/value pairs where either part of the pair fails to parse.

daotoad