tags:

views:

99

answers:

3

I have a HTML string and want to replace all links to just a text.

E.g. having

Some text <a href="http://google.com/"&gt;Google&lt;/a&gt;.

need to get

Some text Google.

What regex should I use?

+2  A: 

Several similar questions have been posted and the best practice is to use Html Agility Pack which is built specifically to achieve thing like this.

http://www.codeplex.com/htmlagilitypack

Fadrian Sudaman
In second note, if you really need a regex solution, you can do this \<a href=.*?\>(?<text>.*?)\</a\> to extract the text and replace using the same regex string pattern, or simply replace \<a href=.*?\> and \</a\> with empty string
Fadrian Sudaman
+1 this answer. `<a href=.*?>...` will fail even for simple, valid HTML. Allowing `.*?` is naïve even by the low, low standards of regex; for example a simple difference like the close-tag being `</a >` and you've just matched a big stretch of document across multiple links by mistake. Plus, of course, the hundred other constructs that'll trip this over. Do yourself a favour. Use an HTML parser. It's what they're there for.
bobince
A: 

I asked about simple regex (thanks Fabrian). The code will be the following:

var html = @"Some text <a href="http://google.com/"&gt;Google&lt;/a&gt;.";
Regex r = new Regex(@"\<a href=.*?\>");
html = r.Replace(html, "");
r = new Regex(@"\</a\>");
html = r.Replace(html, "");
sashaeve
Welcome. So I take it that this is what you wanted then? If you please accept the answer so not wasting other time to post more answers
Fadrian Sudaman
this doesn't handle the case where the tag has a different attribute (i.e. title) before href. See my answer below.
Andrew Theken
Yes, you are right.
sashaeve
+1  A: 
var html = "<a ....>some text</a>";
var ripper = new Regex("<a.*?>(?<anchortext>.*?)</a>", RegexOptions.IgnoreCase);
html = ripper.Match(html).Groups["anchortext"].Value;
//html = "some text"
Andrew Theken