tags:

views:

54

answers:

3

So my dear SOers, Let me be direct to the point: specification: filter a text file using pairs of patterns. Example: if we have a file:

line 1 blabla
line 2 more blabla
line 3 **PAT1a** blabla
line 4 blabla
line 5 **PAT1b** blabla
line 6 blabla
line 7 **PAT2a** blabla
line 8 blabla
line 9 **PAT2b** blabla
line 10 **PAT3a** blabla
line 11 blabla
line 12 **PAT3b** blabla
more and more blabla

should give:

line 3 **PAT1a** blabla
line 4 blabla
line 5 **PAT1b** blabla
line 7 **PAT2a** blabla
line 8 blabla
line 9 **PAT2b** blabla
line 10 **PAT3a** blabla
line 11 blabla
line 12 **PAT3b** blabla

I know how to filer only one part of it using 'sed': sed -n -e '/PAT1a/,/PAT1b/{p}' But how to filter all the snippets, do i need to write those pairs of patterns in a configuration file, read a pair from it, use the sed cmd above, go to next pair...?

Note: Suppose PAT1, PAT2 and PAT3, etc share no common prefix(like 'PAT' in this case)

One thing more: how to make a newline in quota text in this post without leaving a whole blank line?

A: 

Awk.

$ awk '/[0-9]a/{o=$0;getline;$0=o"\n"$0;print;next}/[0-9]b/' file
line 3 PAT1a blabla
line 4 blabla
line 5 PAT1b blabla
line 7 PAT2a blabla
line 8 blabla
line 9 PAT2b blabla
line 10 PAT3a blabla
line 11 blabla
line 12 PAT3b blabla

Note: Since you said "share no common prefix", then I use the number and [ab] pattern for regex

ghostdog74
A: 

Use the b command to skip all lines between the patterns and the d command to delete all other lines:

sed -e '/PAT1a/,/PAT1b/b' -e '/PAT2a/,/PAT2b/b' -e '/PAT3a/,/PAT3b/b' -e d
Bart Sas
how about PAT4, PAT5 etc ? ..
ghostdog74
I assume that it is not possible to specify all a-patterns using a single regex and all b-patters using a single regex since otherwise sed -n -e '/a-regex/,/b-regex/{p} would have worked.
Bart Sas
+2  A: 

I assumed the pattern pairs are given as a separate file. Then, when they appear in order in the input, you could use this awk script:

awk 'NR == FNR { a[NR] = $1; b[NR] = $2; next }
     !s && $0 ~ a[i+1] { s = 1 }
     s
     s && $0 ~ b[i+1] { s = 0; i++ }' patterns.txt input.txt

And a more complicated version when the patterns can appear out of order:

awk 'NR == FNR { a[++n] = $1; b[n] = $2; next }
     { for (i = 1; !s && i <= n; i++) if ($0 ~ a[i]) s = i; }
     s
     s && $0 ~ b[s] { s = 0 }' patterns.txt input.txt
schot
bingo, marvellous skill! thanks for sharing it.
lukmac