ansaurus

Question

Regular expression to strip everything between anchor tags

Answer 1

+2 A:

I recommend Expresso to troubleshoot regular expressions. You can find a library of regular expressions here.

You might consider using javascript to walk the DOM tree for your replacements instead of regex.

Dave Swersky 2010-01-19 13:08:58

Answer 2

+3 A:

Use an HTML Parser and not Regular Expressions to parse HTML.

HTML Agiliity Pack

RC 2010-01-19 13:12:17

Answer 3

A:

Conceptually, this only strips links of a very special kind (e.g. your regex does not match upper-case A which is perfectly valid in HTML: <A ...>bla</A>. The replacement wouldn't work for javascript links either. Is your code relevant to user security?

Thorsten79 2010-01-19 13:15:35

Answer 4

+2 A:

Problems in your string: Unnecessary slash at the beginning (that's Perl syntax), unescaped backslash (\b), unnecessary escaped backslash (\\).

So, if it has to be a Regex, taking all warnings into account that enough other people have linked to, try

string LINK_TAG_PATTERN = @"<a\b[^>]*>(.*?)</a>";
htmltext = Regex.Replace(htmltext, LINK_TAG_PATTERN, string.Empty, RegexOptions.IgnoreCase);

The \b is necessary to prevent other tags that start with a from matching.

Tim Pietzcker 2010-01-19 13:19:25

Answer 5

+1 A:

string LINK_TAG_PATTERN = @"(<a\s+[^>]*>)(.*?)(</a>)";

htmltext = Regex.Replace(htmltext, LINK_TAG_PATTERN, "$1$3", RegexOptions.IgnoreCase);

Igor Korkhov 2010-01-19 13:36:47

ansaurus

tags:

views:

answers:

Regular expression to strip everything between anchor tags

related questions