tags:

views:

755

answers:

3

My log file is:

 Wed Nov 12 blah blah blah blah cat1
 Wed Nov 12 blah blah blah blah
 Wed Nov 12 blah blah blah blah 
 Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
 Wed Nov 12 blah blah blah blah cat3
 Wed Nov 12 blah blah blah blah cat4

I want to parse out the full multiline entries where cat is found on the first line. What's the best way to do this in sed and/or awk?

i.e. i want my parse to produce:

 Wed Nov 12 blah blah blah blah cat1
 Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
 Wed Nov 12 blah blah blah blah cat3
 Wed Nov 12 blah blah blah blah cat4
+1  A: 

Assuming your log file does not contain the control characters '\01' and '\02', and that a continued line begins with precisely four spaces, the following might work:

c1=`echo -en '\01'`
c2=`echo -en '\02'`
cat logfile | tr '\n' $c1 | sed "s/$c1    /$c2/g" | sed "s/$c1/\n/g" | grep cat | sed "s/$c2/\n    /g"

Explanation: this replaces each newline with ASCII 1 (a control character that should never appear in a log file) and each sequence "newline-space-space-space-space" with ASCII 2 (another control character). It then re-replaces ASCII 1 with newlines, so now each sequence of multiple lines is put into one line, with the old line breaks replaced by ASCII 2. This is grepped for cat, and then the ASCII 2's are re-replaced with the newline-space-space-space-space combination.

Adam Rosenfield
A: 

if you say every line that starts with space is a continuation of the folling its easy with (g)awk (out of mind, so maybe it contains some minor typos, and for better readability with some additional linebreaks):

awk " BEGIN { multiline = 0;} 
      ! /^ / { if (whatever) 
                 { print; multiline = 1;} 
               else 
                 multiline = 0; 
             } 
        /^ / {if (multiline == 1) 
                 print;
             } 
     " 
      yourfile

where whatever is your check if your output should happen (e.g. for the cat).

flolo
A: 

Something like this?

awk 'function print_part() { if(cat) print part }  /^  / { part = part "\n" $0; next } /cat[0-9]$/ { print_part(); part = $0; cat = 1; next;  } { print_part(); cat=0} END { print_part() }' inputfile

The /^ / regexp identifies continuation lines.

The /cat[0-9]$/ regexp identifies the starter lines you want to keep.

divideandconquer.se