tags:

views:

145

answers:

3

Hi, I have some text that has HTML hyper-links in it. I want to remove the hyperlinks, but only specific ones.

e.g. I start with this:

This is text <a href="link/to/somewhere">Link to Remove</a> and more text with another link <a href="/link/to/somewhere/else">Keep this link</a>

I want to have:

This is text and more text with another link <a href="/link/to/somewhere/else">Keep this link</a> 

I have this RegEx expression,

<a\s[^>]*>.*?</a>

... but it matches ALL of the links.

What do I need to add to that expression to match only the links with the link-text 'Remove' (for example) in it?

thanks in advance.

+1  A: 

You'll probably get a lot of feedback not to use regular expressions on HTML... but if you do decide to use one, try this:

 <a\s[^>]*>.*?Remove.*?</a>

This is where "Remove" lies somewhere in the link text.

Keltex
Thanks, that got it. And if I wanted to match on 'remove' with case insensitivity, what would I wrap that in? (e.g. match on 'Remove' or 'remove' or 'REMOVE' etc...)
Rob
@Rob: Pretty sure C# has something like `RegexOptions.IgnoreCase` that you can pass in as another param.
Mark
A: 
$str=~/(.*)<a.*<\/a>([a-z ]+ <a.*<\/a>)/;
print "$1$2";
muruga
A: 

(.*?)<a.*[Rr]emove.*?a>(.*)

reconstruct with: $1$2

Paul