tags:

views:

116

answers:

2

I'm wondering what the regex for an eregi_replace would be needed to catch a string that is not contained in an alt attribute.

e.g. It should find and replace John Doe in:

"John Doe was born on..."

but not find/replace when John Doe's in any tag for example:

<img src="/jd.jpg" alt="John Doe at the beach" />
A: 

You've reached the limitations of regex. You'll need a custom parser for this. tags can be nested, and regex can't match patterns like

<b>
<<b>>
<<<b>>>

while not matching patterns like

<b>>
<<b>
<<b>>>

etc

Charles Ma
But if we're searching for text in an element there wouldn't be nesting. An element is within <...>
Ian
+1  A: 

If I wanted to replace "John Doe" if it's not inside a tag, I would do this:

$str = preg_replace('/John Doe(?![^<>]*+>)/i', $new_name, $str);

(?![^<>]*+>) is a negative lookahead; it says "if there are any angle brackets ahead of this point, the first one is not a closing bracket." That's not foolproof, since attribute values can contain angle brackets, but in my experience they rarely do.

Regexes are fundamentally incompatible with HTML; even with the advanced features offered by the preg_ suite, like lookarounds and possessive quantifiers, you often have to rely on simplifying assumptions like no angle brackets in attribute values. I wouldn't even attempt this job with the much-more-limited ereg_ functions.

Alan Moore
This seems to have done the trick! Thanks.
Ian
Neat usage of zero-width lookahead.
Don Johe