tags:

views:

385

answers:

5

I just asked a question about how to check if the current line is blank or not in Perl.

That works for the current line, but how do I check to see if the next line is blank?

Text file to parse:(i need parse the text file and create a new XML file)

constant fixup GemEstabCommDelay = <U2 20>
    vid = 6
    name = "ESTABLISHCOMMUNICATIONSTIMEOUT"
    units = "s"
    min = <U2 0>
    max = <U2 1800>
    default = <U2 20>


constant fixup private GemConstantFileName = <A "C:\\TMP\\CONST.LOG">
    vid = 4
    name = ""  units = ""


constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
    vid = 0
    name = ""
    units = ""  

I want the out put below.

<EquipmentConstants>
<ECID logicalName="GemEstabCommDelay " valueType="U2" value="20" vid="6" name="ESTABLISHCOMMUNICATIONSTIMEOUT" units="s" min="0" max="1800" default="20"></ECID>
<ECID logicalName="GemConstantFileName" valueType="A" value="C:\\TMP\\CONST.LOG" vid="4" name="" units=""></ECID>
<ECID logicalName="GemAlarmFileName" valueType="A" value="C:\\TMP\\ALARM.LOG" vid="0" name="" units=""></ECID>
</EquipmentConstants>
+1  A: 

am not sure what you want, but i assume you want to display blocks that has "units=xxx" at the very end of each block. if not, describe your output clearly

$/ = "\n\n"; #set record separator
while (<>) {
    chomp;
    @F = split(/\n/, $_);
    if ($F[-1] =~ /units/) {
        print $_ ."\n";
    }
}

output

$ perl test.pl file

constant fixup private GemConstantFileName = <A "C:\\TMP\\CONST.LOG">
    vid = 4
    name = ""  units = ""

constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
    vid = 0
    name = ""
    units = ""
ghostdog74
Hi , I updated my post with what i want .
Nano HE
+2  A: 

Use separate variables to store the current and next lines:

$_ = <>;
while ($next_line = <>) {
    if ($next_line !~ /\S/) {
        # do something with $_ when next line is blank
    } else {
        # do something else with $_ when next line is not blank
    }
    $_ = $next_line;
}
# $_ now contains last line of file -- you may want to do something with it here
mobrule
Alternatively (slightly cleaner IMHO, perhaps more memory intensive) you can read in the entire file into an array (`@lines`) and do `for(0 .. $#lines)`. Current line: `$lines[$_]` Next line: `$lines[$_+1]`. But it depends.
Chris Lutz
A: 

If you don't care about memory usage, or the file you're reading is relatively small, you can just read the whole of it into an array.

@lines = <>;

for ($i = 0; $i < @lines; $i++)
{
    print "Current line blank" if ( "" eq @lines[$i]);
    print "Next line blank"    if ( "" eq @lines[$i + 1]);
}
hollaburoo
You may start off with small data files, but useful programs tend to be pressed into service for other uses. Don't build in fragility when it's just as much work to avoid the risk.
brian d foy
+1  A: 
use strict;
my @lines=<>; # slurp-in the whole file

for (my $i=0; $i<@lines-1; $i++) {
  print "line " .  ($i + 1) . " : next line is blank\n" if $lines[$i+1] =~ /^\s*$/;
}
I'd just do `for(0 .. $#lines)` (or `$#lines - 1` if you don't want to process the last line).
Chris Lutz
good point. I'm used to use @lines to evaluate in a scalar context. But $#lines might be better.
This is terrible if the file is large. There is no need to slurp in the whole thing at once, when all that is important is the current and next line.
Ether
@Ether et. Hi, My text file is very huge. It inclued about 56500 lines.
Nano HE
@Nano: 56500 lines isn't "very huge". But if performance is important you should mention that in your question.
+5  A: 

Let perl do it for you. Put the handle in paragraph mode:

$/ = "";  # paragraph mode
while (<>) {
    ...
}

Now in every iteration of the loop, $_ will contain an entire record, where each record is separated by two or more newlines.

See it in action:

#! /usr/bin/perl

use warnings;
use strict;

use 5.10.0;  # for named capture buffers and %+

my $equipconst = qr/
  ^
  constant \s+ fixup \s+ (?:private \s+)?
  (?<logicalName>.+?)  # non-greedy to right-trim whitespace
  \s+ = \s+
  < (?<valueType>\S+) \s+ (?<value>\S+) >
/x;

my $equipattr = qr/
    \s*
    (?<name>\S+)
    \s* = \s*
    (?<value>.+?)  # must be non-greedy!
/x;

# read from DATA rather than standard input/named arguments
# (used for demo purposes only)
*ARGV = *DATA;

print "<EquipmentConstants>\n";

$/ = "";
while (<>) {
  if (/$equipconst/g) {
    my @attrs = map [ $_ => $+{$_} ] =>
                qw/ logicalName valueType value /;

    # \G picks up where the last //g stopped
    while (/\G $equipattr (?=\s*$|$equipattr)/gx) {
      my($name,$value) = @+{ qw/ name value / };

      # discard tag, e.g., <U2 1800> becomes 1800
      $value =~ s/<.+ (.+)>/$1/;
      push @attrs => [ $name => $value ];
    }

    my $attrs = join " ",
                map {
                  # strip quotes if present
                  $_->[1] =~ s/^"(.*)"$/$1/;
                  qq{$_->[0]="$_->[1]"};
                }
                @attrs;

    print "<ECID $attrs></ECID>\n";
  }
}

print "</EquipmentConstants>\n";

__DATA__
constant fixup GemEstabCommDelay = <U2 20>
    vid = 6
    name = "ESTABLISHCOMMUNICATIONSTIMEOUT"
    units = "s"
    min = <U2 0>
    max = <U2 1800>
    default = <U2 20>


constant fixup private GemConstantFileName = <A "C:\\TMP\\CONST.LOG">
    vid = 4
    name = ""  units = ""


constant fixup private GemAlarmFileName = <A "C:\\TMP\\ALARM.LOG">
    vid = 0
    name = ""
    units = ""

Output:

<EquipmentConstants>
<ECID logicalName="GemEstabCommDelay" valueType="U2" value="20" vid="6" name="ESTABLISHCOMMUNICATIONSTIMEOUT" units="s" min="0" max="1800" default="20"></ECID>
<ECID logicalName="GemConstantFileName" valueType="A" value="C:\\TMP\\CONST.LOG" vid="4" name="" units=""></ECID>
<ECID logicalName="GemAlarmFileName" valueType="A" value="C:\\TMP\\ALARM.LOG" vid="0" name="" units=""></ECID>
</EquipmentConstants>

Note that it differs slightly from your spec: the first logicalName attribute does not contain whitespace.

Greg Bacon
Hi gbacon, I installed 5.10.1 just now. I ran the script and found it worked well except the loop function (while (<>) , i add a print "foo"; right after while(<>), the foo only print once.). The output only showed the 1st paragraph. (Closed to __DATA__). I can't find the root cause.
Nano HE
Perl's `<>` reads from the files named on the command line or falls back to reading the standard input. Assuming the program is named `const`, the input is in `const.dat`, and that both are in the current directory, invoke the program as `perl const const.dat >const.xml`
Greg Bacon
Hi gbacon, I followed the steps. 1st step. Cut the three paragraphs from the code and paste them to a empty my.data file then save the .data file; 2nd step. I commented the two lines in my local script: a. **(# __DAT___)**; b. **(# *ARGV = *DATA)**; 3rd step. Open window command line window. I am sure all my file (my.pl and my.data) located in the current directory. 4th. Run `perl my.pl my.dat >my.xml`; Still only 1st paragraph parsed. Are there any miss operations? (ActivePerl5.10.1 and winxp installed)
Nano HE
@Nano Were you able to resolve this issue?
Greg Bacon