views:

73

answers:

3

Hello.

I plan to skip the block content which include the start line of "MaterializeU4()" with the subroutin() read_block below. But failed.

# Read a constant definition block from a file handle. 
# void return when there is no data left in the file. 
# Otherwise return an array ref containing lines to in the block.  
sub read_block { 
    my $fh = shift; 

    my @lines; 
    my $block_started = 0; 

    while( my $line = <$fh> ) { 

    # how to correct my code below? I don't need the 2nd block content.
 $block_started++ if ( ($line =~ /^(status)/) && (index($line, "MaterializeU4") != 0) ) ;

 if( $block_started ) { 

     last if $line =~ /^\s*$/; 

     push @lines, $line; 
 }

    } 
    return \@lines if @lines;
    return; 
} 

Data as below:

__DATA__ 
status DynTest = <dynamic 100>
vid = 10002
name = "DynTest"
units   = ""

status VIDNAME9000 = <U4 MaterializeU4()>
vid = 9000
name = "VIDNAME9000"
units = "degC"

status DynTest = <U1 100>
vid = 100
name = "Hello"
units   = ""

Output:

  <StatusVariables>
    <SVID logicalName="DynTest" type="L" value="100" vid="10002" name="DynTest" units=""></SVID>
    <SVID logicalName="DynTest" type="L" value="100" vid="100" name="Hello" units=""></SVID>
  </StatusVariables>

[Updated] I print the value of index($line, "MaterializeU4"), it output 25. Then I updated the code as below

$block_started++ if ( ($line =~ /^(status)/) && (index($line, "MaterializeU4") != 25)

Now it works.

Any comments are welcome about my practice.

Thank you.

+1  A: 

On successful match of a substring, index returns the position of the substring, which could be any value >= 0. On "failure", index returns -1.

The way you are using index

index($line, "MaterializeU4") != 0

will be true for all lines except for a line that begins with the string "MaterializeU4".

It looks like you already know a little bit about Perl regular expressions. Why not use one in this case, too?

++$block_started if $line =~ /status/ && $line =~ /MaterializeU4/;

Another issue I see is that you set $block_started to begin capturing lines, but you never set it to zero at the end of the "block", say, when $line is empty. I'm not sure if that's what you wanted to do.

mobrule
@mobrule. Acturally, daotoad kindly wrote the whole completed sample code as another stackoverflow answer for me. It's just a pice code. I am reading and learning with it. Thank you.
Nano HE
@mobrule - I think the `$block_started` is OK not to be reset due to block scope where it's defined - once the block is over, the sub will exit. It'd be a valid concern if the sub processed >1 block.
DVK
@mobrule - Also, I think you meant `$line !~ /MaterializeU4/`, no? Elegant solution... mine produces empty block, whereas yours just skips entire set of that block's lines and continues to next block
DVK
@mobrule, I tested to replace my original if condition with `if ( ($line =~ /^(status)/) ` . It works quite well. thank you.
Nano HE
+1  A: 

First, using a regex instead of index is probably better since you can tune it to the exact format of status string if you may decide to be stricter than just "substring exists"

I would suggest as one solution adding a second flag to skip the block contents if it's a MaterializeU4 block, as follows:

# Read a constant definition block from a file handle. 
# void return when there is no data left in the file. 
# Empty return for skippable (Materialize4U) block!!!
# Otherwise return an array ref containing lines to in the block.  
sub read_block { 
    my $fh = shift; 
    my @lines = (); 
    my $block_started = 0; 
    my $block_ignore = 0;
    while (my $line = <$fh> ) { 
        if ($line =~ /^status.*?((MaterializeU4)?)/) {
            $block_started = 1;
            $block_ignore = 1 if $1;
        }
        last if $line =~ /^\s*$/ && $block_started;
        push @lines, $line unless $block_ignore; 
    } 
    return \@lines if @lines || $block_started;
    return; 
} 

Here's a slightly modified sample I tested using codepad.org:

Code:

use Data::Dumper;
my @all_lines = (
  "s 1" ,"b 1" ,""
, "s MaterializeU4" ,"b 2" ,""
, "s 3" ,"b 3" ,""
);

while (@all_lines) {
    my $block = read_block();
    print Data::Dumper->Dump([$block]);
}
exit 0;

sub read_block { 
    my @lines = (); 
    my $block_started = 0; 
    my $block_ignore = 0;
    while (my $line = shift @all_lines) { 
        if ($line =~ /^s .*?((MaterializeU4)?)/) {
            $block_started = 1;
            $block_ignore = 1 if $1;
        }
        last if $line =~ /^\s*$/ && $block_started;
        push @lines, $line unless $block_ignore; 
    } 
    return \@lines if @lines || $block_started;
    return; 
} 

Output:

$VAR1 = [
          's 1',
          'b 1'
        ];
$VAR1 = [];
$VAR1 = [
          's 3',
          'b 3'
        ];
DVK
BTW, I went for slightly more flexible approach of returning empty array for skippable block instead of just silently skipping it. It's a bit of overkill for current need, but OTOH can be extended later to do some other handling of such blocks ythan just skipping.
DVK
@DVK,Unfortunately, I tested with your read_block(). It doesn't work. I both tested two conditions: 1st. Replace read_block with the new one. 2nd. Replace my `if ( ($line =~ /^(status)/) ` with `if ($line =~ /^status.*((MaterializeU4)?)/)`. I don't know why?
Nano HE
@DVK. furthermore, I tested instead `if ($line =~ /^status.*((MaterializeU4)?)/)` with `if ( ($line =~ /^(status)/) `. It doesn't work too. I guess maybe there is a bug in the subroutine. Just guess :-)
Nano HE
Quite possibly... thus "as follows" lawyering instead of "here's your code"... let me look at what's off here.
DVK
OK... the bug was I forgot toi un-greedify my regex. `.*` should be `.*?` otherwise it'd eat entire line including Materialize4U
DVK
@DVK, I tested your sample code. If i replace your data line `s MaterializeU4` with `status VIDNAME9000 = <U4 MaterializeU4()>`. Then the output will include 2nd block!. I guess it's a reg exp issue, thank you.
Nano HE
@Nano - for `status VIDNAME9000 = <U4 MaterializeU4()>` the regexp will be ` if ($line =~ /^status\s.*?((MaterializeU4)?)/) {`
DVK
@DVK. Sorry for bother you again and again. I did more testing against you script above. If the data line change to `s MaterializeU4`(between the two words, there are more than one white space),It doesn't work; On the other hand, I tested `/^status\s.*?((MaterializeU4)?)/` with data line `status VIDNAME9000 = <U4 MaterializeU4()>`. I found the limitation. It will works when there are ONLY one white space between `status` and `MaterializeU4`. It doesn't matter with the following characters of `= < () >, etc` etc.Thank you.
Nano HE
@Nano - I'm not sure what you mean, can you please post the exact code you used to test?
DVK
This is my test code:<code>my $s;$s="status MaterializeU4"; print (($s =~ /s\s.*?((MaterializeU4)?)/) ? "Match\n" : "No Match\n");$s="status MaterializeU4"; print (($s =~ /s\s.*?((MaterializeU4)?)/) ? "Match\n" : "No Match\n");$s="statusaa dMaterializeU4"; print (($s =~ /s\s.*?((MaterializeU4)?)/) ? "Match\n" : "No Match\n");$s="status < MaterializeU4>"; print (($s =~ /s\s.*?((MaterializeU4)?)/) ? "Match\n" : "No Match\n");$s="status VIDNAME9000 = <U4 MaterializeU4()>"; print (($s =~ /s\s.*?((MaterializeU4)?)/) ? "Match\n" : "No Match\n");</code>
DVK
The result was what I expected: MatchMatchNo MatchMatchMatch
DVK
See http://codepad.org/JGPIO4AC
DVK
@DVK, I posted my testing at my blog.please see http://www.nanohe.net/blog/2010/06/318/test1.pl output the same result as yours.But Why test2.pl can’t skip the 2nd block content? thanks.
Nano HE
@DVK, BTW, I am using WinXP OS and ActivePerl10.1.
Nano HE
@Nano - test2 can't skip becase test2.pl has regexp "s\s" and your data has "status " at the beginning of the screen.
DVK
@DVK, As your said, I still can't understand why `$s="status VIDNAME9000 = <U4 MaterializeU4()>"; print (($s =~ /s\s.*?((MaterializeU4)?)/) ? "Match\n" : "No Match\n"); ` - output `match`. That means the regexp of `/s\s.*?((MaterializeU4)?)/` works quite well to match `$s`.(from test1.pl test result). BTW, How can I fix test2.pl. I tried to change regexp to `/status\s.*?((MaterializeU4)?)/`, but still can't skip the block of MaterializeU4, thank you .
Nano HE
Oh... sorry didn't notice - the anchor is missing. You need `/^s\s.*?((MaterializeU4)?)/` so "^" matches start of string, otherwise "s\s" would match the last "s" in status
DVK
@DVK, Please see here. http://codepad.org/6deFNMM1. You are so nice to keep on tracking and helping on my problem. I learned several good practice script as well. It's an exciting thing to learning new knowledge and fix bugs. Great appreciated and thank you so much.
Nano HE
+6  A: 

Perl already has an operator to keep track of blocks. It's called the "flip-flop" operator:

Try this out:

while ( <DATA> ) { 
   next if /\Q<U4 MaterializeU4()>\E/../^\s*$/;
   push @lines, $_;
}

The value of /\Q<U4 MaterializeU4()>\E/../^\s*$/ will be true when it sees a line that matches the starting regex and it will stop being true after it sees a line matching the second expression.

Axeman
See [perlfaq6 - How can I pull out lines between two patterns that are themselves on different lines?](http://perldoc.perl.org/perlfaq6.html#How-can-I-pull-out-lines-between-two-patterns-that-are-themselves-on-different-lines%3f)
Zaid