tags:

views:

56

answers:

4
+1  Q: 

Grep / RegExp help

Hi everyone! I apologize if this is a really stupid question. I have data in the format:

etc etc etc <span>etc etc etc</span> etc etc etc
etc etc etc <span>etc etc etc</span> etc etc etc
etc etc etc <span>etc etc etc</span> etc etc etc

Is there a way to grep each line for a match that falls outside of the span tags on either side?

A: 

use gawk if you have it (state your OS next time)

gawk 'BEGIN{
    RS="</span>"
    FS="\n"
}
{
  m=split($0,a,"<span>")
  if( a[1] ~ /word/){
    print "found: "a[1]" in line: "NR
  }
} ' file

output

$ cat file
word <span> word blah</span> word
word <span> word
          blah</span>
word etc <span> word blah</span> etc

$ ./shell.sh
found: word  in line: 1
found:  word
word  in line: 2
found:
word etc  in line: 3
ghostdog74
A: 

Or try sed:

sed 's:<span>.*</span>::' <FILE>

HTH

Zsolt Botykai
+1  A: 
grep "\(StringGoesHere.*<span>.*</span>\)\|\(<span>.*</span>.*StringGoesHere\)"

This just tests for StringGoesHere before the span tags, and again after the span tags. This won't work if there is more than 1 set of span tags per line, and also won't work if the line doesn't have any span tags.

m42a
Correct, though a small verbal explanation might help.
Kaleb Pederson
A: 

grep "SearchString" | grep -v "<span>.*SearchString.*</span>"

should find any line at all with SearchString, then remove the lines where SearchString is inside <span> ... </span>.

Isaac