tags:

views:

36

answers:

1

I'd like to accomplish what this (invalid I believe) regular expression tries to do:

<p><a>([^(<\/a>)]+?)<\/a></p>uniquestring

Essentially match anything except a closing anchor tag. Simple non-greedy doesn't help here because `uniquestring' may very well be after another distant closing anchor tag:

<p><a>text I don't <tag>want</tag> to match</a></p>random 
data<p><a>text I do <tag>want to</tag> match</a></p>uniquestring more
matches <p><a>of <tag>text I do</tag> want to match</a></p>uniquestring 

So I have more tag in between the anchor tags. And I'm using the presence of uniquestring to determine if I want to match the data. So a simple non-greedy ends up matching everything from the start of the data I don't want to the end of the data I do want.

I know I'm edging close to the problems regular expressions (or at least my knowledge of them) aren't good at solving. I could just through the data at an HTML/XML parser, but it is just one simple(ish) search.

Is there some easy way to do this that I'm just missing?

+1  A: 

You are looking for zero-width negative look-behind:

<p><a>((?<!<\/a>).)+<\/a><\/p>uniquestring

Test:

(zyx:~) % echo $T
<p><a>text I don't <tag>want</tag> to match</a></p>random  data<p><a>text I do <tag>want to</tag> match</a></p>uniquestring more matches <p><a>of <tag>text I do</tag> want to match</a></p>uniquestring
(zyx:~) % echo $T | grep -oP '<p><a>((?<!<\/a>).)+<\/a><\/p>uniquestring'
<p><a>text I do <tag>want to</tag> match</a></p>uniquestring
<p><a>of <tag>text I do</tag> want to match</a></p>uniquestring
ZyX
Indeed that is what I was looking for! And I almost understand it. :-)
Tim Lytle
I would have used a *lookahead*, not a lookbehind. Your way, it has to get all the way through the `</a>` sequence before it realizes it wasn't supposed to match it. `(?!<\/a>)` stops matching at the first character.
Alan Moore