tags:

views:

66

answers:

4

I have text files generated by one of my tools with structure shown below.

1 line text
(space)
multiple
lines
text
(space)
multiple
lines
text
nr 2
---------------------------------------------------------- (58 '-' characters)
different 1 line text
(space)
different
multiple
lines
text
(space)
different
multiple
lines
text
nr 2
----------------------------------------------------------
different 1 line text
(space)
different
multiple
lines
text
(space)
different
multiple
lines
text
nr 2
----------------------------------------------------------
(space)

Each file begins with 1 line text and ends with '-' signs separator and space. There are different numbers of sections in each file and each section that is 'in the middle' starts and ends with '-' signs. Below is what I would like to achieve.

multiple
lines
text
(space)
different
multiple
lines
text
(space)
different
multiple
lines
text

I would like to remove all one liners, all 58 '-' characters dividers and all 'second' multiple liners and have only 'first' multiple liners from each section one under another divided with spaces. Could someone recommend how to do it on linux? Any suggestions will help.

A: 

I would go awk over sed. Build a list until you hit /-+$/ and then output the multiple lines section that you stored up until each dashed line.

EDIT: I would go perl before that, but awk is fun, too.

Jeff Ober
A: 

The following perl script will do what you want (I find that sed is not that well suited to tasks spanning multiple lines).

#!/usr/bin/perl

$first = 1;
$skip = 2;
while (<>) {
    chomp;
    $ln = $_;
    if ($ln =~ /^-{58}$/) {
        $skip = 2;
        next;
    }
    if ($skip > 0) {
        $skip--;
        if ($skip == 0) {
            if ($first) {
                $first = 0;
            } else {
                print "\n";
            }
        }
        next;
    }
    if ($skip == 0) {
        print $ln . "\n";
        if ($ln =~ /^$/) {
            $skip = -1;
        }
    }
}

This is based on the assumption that your (space) lines are just empty lines. If they're not, you will need to adjust the /^$/ pattern near the bottom to match what it actually is.

It's basically a simplified state machine controlled by the $skip variable. When this is positive, you're skipping that many lines (starts at 2 and is set to 2 for every --- line).

When $skip reaches zero, it stays there until you get an empty line (you're echoing these lines as you go). When you get an empty line, you set it to -1 and stop echoing the lines.

The $first variable is a bit of a hack to ensure there's no trailing blank line in your output.

Here's the output I got from your input file:

multiple
lines
text
(space)
different
multiple
lines
text
(space)
different
multiple
lines
text

which I believe is what you were after.

paxdiablo
A: 

Edit: to print the first multiline group:

awk 'BEGIN {toggle=1} /^\(space)$/ {if (!toggle) print ""; toggle=!toggle; next} {if (! toggle) print}' file.txt

Original: to print the second multiline group:

awk '/^\(space)$/ { accum=""; next} /^-+$/ {print accum; accum=""; next} {accum=accum"\n"$0}' file.txt
Dennis Williamson
I used "(space)" as a literal string, but you can change that to `/^$/` to check for an empty line.
Dennis Williamson
A: 

gawk

awk  '{ print $2 }' RS="-\n" FS="\n\n" file

output

$ ./shell.sh
multiple
lines
text
different
multiple
lines
text
different
multiple
lines 
text

the equivalent in Perl.

$\ = "\n";
$/ = "-\n";
while (<>) {
    chomp;
    ($f1,$f2) = split "\n\n", $_ ;
    print $f2;
}
ghostdog74
You may need to modify this since it's not outputting the blank lines.
paxdiablo
i will leave it to the OP as an exercise.
ghostdog74