views:

561

answers:

4

Hello,

I'm really sorry if this has been answered before but I just can't seem to find the proper answer. I have a string in JS and it includes an href tag. I want to remove all links AND the text. I know how to just remove the link and leave the inner text but I want to remove the link completely.

For example:

var s = "check this out <a href='http://www.google.com'&gt;Click me</a>. cool, huh?";

I would like to use regex so I'm left with:

s = "check this out. cool, huh?";

Thanks for your help,

g

+1  A: 

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Chas. Owens
Duplicate http://www.google.com/search?q=site:stackoverflow.com+%22Regexes+are+fundamentally+bad+at+parsing+HTML%22 ;)
Gumbo
This begins to sound like a cliche. Sometimes you don't need to really parse the HTML into a data structure of some kind, you just have to somehow manipulate that string. There are cases when RegExp makes sense. Right tool for the right job. And by the way, John Resig has written an HTML parser in JavaScript and he used some RegExp in there. http://ejohn.org/blog/pure-javascript-html-parser/
Ionuț G. Stan
@Ionut G. Stan You always need to parse HTML into a data structure because that is the only way to reliably work with it. Regexes are part of parsing, but these questions always want to use one regex to find or replace something. That is impossible with traditional regexes (as the one of the links in the answer shows) and very hard to get right with the ones where it is possible (e.g. Perl's implementation that adds recursion). There are many libraries available that already perform the task of working with HTML for you. You should use them, not a regex that is guaranteed to fail.
Chas. Owens
A: 

If you only want to remove <a> elements, the following should work well:

s.replace(/<a [^>]+>[^<]*<\/a>/, '');

This should work for the example you gave, but it won't work for nested tags, for example it wouldn't work with this HTML:

<a href="http://www.google.com"&gt;&lt;em&gt;Google&lt;/em&gt;&lt;/a&gt;
georgebrock
+2  A: 

This will strip out everything between <a and /a>:

mystr = "check this out <a href='http://www.google.com'&gt;Click me</a>. cool, huh?";
alert(mystr.replace(/<a\b[^>]*>(.*?)<\/a>/i,""));

It's not really foolproof, but maybe it'll do the trick for your purpose...

ChristopheD
my suggestion: /<a(\s[^>]*)?>.*?<\/a>/ig
Christoph
+1  A: 

Just commented about John Resig's HTML parser. Maybe it helps on your problem.

Ionuț G. Stan