ansaurus

Question

regular expression to match specific text not linked

Answer 1

+2 A:

See this previous SO question.

Amber 2010-02-10 00:57:11

Answer 2

+2 A:

You can use a negative look-behind to get the opening <a href=...:

var tmpStr = new RegExp('(?<!<a.*?>)match text(?!</a>)');

Hope that works for you.

Eric Wendelin 2010-02-10 01:00:17

did you mean "(?!<a.*?>)match text(?!</a>)" ? - this is exactly what i was looking for, thank you very much

Tomba 2010-02-10 01:08:30

Note that this won't avoid matching, say, the `match text` inside of `<a href="...">test match text foo</a>`.

Amber 2010-02-10 01:10:24

@Dav: Right, sorry, didn't take it that far. Though it sounds like it is hard/impossible to handle every case ;)

Eric Wendelin 2010-02-10 01:35:11

@Tomba, I believe `(?<!<a.*?>)` *is* what Eric intended to write. You were using a negative lookbehind, weren't you Eric? Trouble is, JavaScript doesn't support lookbehinds. But even if it did, regexes would be useless for this task unless you could simplify the problem somehow, as Dav suggested above.

Alan Moore 2010-02-10 02:40:24

Answer 3

+2 A:

Thanks for the very quick and helpful answers. Just to clarify, the regular expression I ended up using was

(?!<a.*?>)\bmatch text\b(?!</a>)

Tomba 2010-02-10 01:14:43

You realize that the above expression will match `<a href="test.html">match text </a>`, correct? In fact, it will match anything where there's a space or other text before the `</a>`, because the `(?!<a.*?>)` is literally doing nothing - the regex you've posted above is *exactly identical* in function to the 'best effort' posted in your OP: `\bmatch text\b(?!</a>)` - why? Because `(?!<a.*?>)\b` is identical to `\b` - a lookahead for something that is not a word border, followed by a requirement word border, will only match a word border.

Amber 2010-02-10 04:22:03

Essentially, there are two cases here: either you need to match `match text` anywhere except where it is the *only thing* inside the link (i.e. `<a href="...">match text</a>`, no spaces, no other tags, nothing) - in which case your regex in the OP already would have worked fine without modification; or you need to match the text but only if it's not inside a link, even if wrapped in other text (i.e. `<a href="..."><strong>match text</strong></a>` *shouldn't* be matched), in which case the regex above won't work. Either way, you don't actually gain anything from adding `(?!<a.*?>` to the front.

Amber 2010-02-10 04:29:32

@Dav - Thanks for explaining that. though it's not obvious from the question, I would ideally like to match "match text" wherever it is enclosed in <a..> </a> tags, whether or not there are spaces (basically what I want to do is find a certain string, then convert it into a link, unless it's already a link). However, I can be 99% sure that the match will be the only thing inside the link, so the original (OP) regular expression will probably work fine in practice.

Tomba 2010-02-10 08:01:14

Thanks for Dav and others for pointing out my answer does not make sense! I have not deleted the answer because of the valuable comments below.

Tomba 2010-02-10 09:05:51

ansaurus

tags:

views:

answers:

regular expression to match specific text not linked

related questions