views:

313

answers:

2

A perl script I'm writing needs to parse a file that has continuation lines like a Makefile. i.e. lines that begin with whitespace are part of the previous line.

I wrote the code below but don't feel like it is very clean or perl-ish (heck, it doesn't even use "redo"!)

There are many edge cases: EOF at odd places, single-line files, files that start or end with a blank line (or non-blank line, or continuation line), empty files. All my test cases (and code) are here: http://whatexit.org/tal/flatten.tar

Can you write cleaner, perl-ish, code that passes all my tests?

#!/usr/bin/perl -w

use strict;

sub process_file_with_continuations {
    my $processref = shift @_;
    my $nextline;
    my $line = <ARGV>;

    $line = '' unless defined $line;
    chomp $line;

    while (defined($nextline = <ARGV>)) {
        chomp $nextline;
        next if $nextline =~ /^\s*#/;  # skip comments
        $nextline =~ s/\s+$//g;  # remove trailing whitespace
        if (eof()) {  # Handle EOF
            $nextline =~ s/^\s+/ /;
            if ($nextline =~ /^\s+/) {  # indented line
                &$processref($line . $nextline);
            }
            else {
                &$processref($line);
                &$processref($nextline) if $nextline ne '';
            }
            $line = '';
        }
        elsif ($nextline eq '') {  # blank line
            &$processref($line);
            $line = '';
        }
        elsif ($nextline =~ /^\s+/) {  # indented line
            $nextline =~ s/^\s+/ /;
            $line .= $nextline;
        }
        else {  # non-indented line
            &$processref($line) unless $line eq '';
            $line = $nextline;
        }
    }
    &$processref($line) unless $line eq '';
}

sub process_one_line {
    my $line = shift @_;
    print "$line\n";
}

process_file_with_continuations \&process_one_line;
+5  A: 

How about slurping the whole file into memory and processing it using regular expressions. Much more 'perlish'. This passes your tests and is much smaller and neater:

#!/usr/bin/perl

use strict;
use warnings;

$/ = undef;             # we want no input record separator.
my $file = <>;          # slurp whole file

$file =~ s/^\n//;       # Remove newline at start of file
$file =~ s/\s+\n/\n/g;  # Remove trailing whitespace.
$file =~ s/\n\s*#[^\n]+//g;     # Remove comments.
$file =~ s/\n\s+/ /g;   # Merge continuations

# Done
print $file;
Nic Gibson
One thing to bear in mind with both mine and Mirod's answers is that it would be a good idea to localise the special variables used if you are embedding this in a larger piece of code (e.g. 'local $/')
Nic Gibson
I just coded to pass the tests ;--) You are right though.
mirod
@mirod - heh. this is almost identical to an exercise in the Perl intro course I teach (it's about unfolding mail headers). It's probably a fairly common issue with *so* many ways to do it :)
Nic Gibson
+3  A: 

If you don't mind loading the entire file in memory, then the code below passes the tests. It stores the lines in an array, adding each line either to the previous one (continuation) or at the end of the array (other).

#!/usr/bin/perl

use strict;
use warnings;

my @out;

while( <>)
  { chomp;
    s{#.*}{};             # suppress comments
    next unless( m{\S});  # skip blank lines
    if( s{^\s+}{ })       # does the line start with spaces?
      { $out[-1] .= $_; } # yes, continuation, add to last line
    else 
      { push @out, $_;  } # no, add as new line
  }

$, = "\n";                # set output field separator
$\ = "\n";                # set output record separator
print @out;
mirod
Your algorithm of course also works if you simply want to process the (joined) lines one by one. Simply do the processing (or printing out) instead of pushing onto @out. Then there is no need to have the whole file in memory at once.
@blixtor: indeed you can replace @out by $last_line, change the inner if by if( s{^\s+}{ }) { $last_line.= $_; } else { print $last_line, "\n"; $last_line= $_; } and the last 3 lines by print $last_line, "\n" if $last_line. I assumed Makefile type lines would not be too big to fit in memory though.
mirod
Yes, I'd rather do it without reading everything into memory. These files may be huge!
TomOnTime