ansaurus

Question

Answer 1

+1 A:

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Chas. Owens 2009-06-06 17:33:33

Duplicate http://www.google.com/search?q=site:stackoverflow.com+%22Regexes+are+fundamentally+bad+at+parsing+HTML%22 ;)

Gumbo 2009-06-06 17:39:58

This begins to sound like a cliche. Sometimes you don't need to really parse the HTML into a data structure of some kind, you just have to somehow manipulate that string. There are cases when RegExp makes sense. Right tool for the right job. And by the way, John Resig has written an HTML parser in JavaScript and he used some RegExp in there. http://ejohn.org/blog/pure-javascript-html-parser/

Ionuț G. Stan 2009-06-06 17:47:04

@Ionut G. Stan You always need to parse HTML into a data structure because that is the only way to reliably work with it. Regexes are part of parsing, but these questions always want to use one regex to find or replace something. That is impossible with traditional regexes (as the one of the links in the answer shows) and very hard to get right with the ones where it is possible (e.g. Perl's implementation that adds recursion). There are many libraries available that already perform the task of working with HTML for you. You should use them, not a regex that is guaranteed to fail.

Chas. Owens 2009-06-06 19:22:39

Answer 2

A:

If you only want to remove <a> elements, the following should work well:

s.replace(/<a [^>]+>[^<]*<\/a>/, '');

This should work for the example you gave, but it won't work for nested tags, for example it wouldn't work with this HTML:

<a href="http://www.google.com"&gt;&lt;em&gt;Google&lt;/em&gt;&lt;/a&gt;

georgebrock 2009-06-06 17:41:38

Answer 3

+2 A:

This will strip out everything between <a and /a>:

mystr = "check this out <a href='http://www.google.com'&gt;Click me</a>. cool, huh?";
alert(mystr.replace(/<a\b[^>]*>(.*?)<\/a>/i,""));

It's not really foolproof, but maybe it'll do the trick for your purpose...

ChristopheD 2009-06-06 17:41:47

my suggestion: /<a(\s[^>]*)?>.*?<\/a>/ig

Christoph 2009-06-06 18:14:22

Answer 4

+1 A:

Just commented about John Resig's HTML parser. Maybe it helps on your problem.

Ionuț G. Stan 2009-06-06 17:49:06

ansaurus

tags:

views:

answers:

Regex in Javascript to remove links

related questions