tags:

views:

57

answers:

3

I have this html:

<a href="http://www.site.com/"&gt;This is the content.</a>

I just need to get rid of the anchor tag html around the content text, so that all I end up with is "This is the content".

Can I do this using Regex.Replace?

+2  A: 

Your regex: <a[^>]+?>(.*?)</a>

Check this Regex with the Regex-class and iterate through the result collection and you should get your inner text.

String text = "<a href=\"link.php\">test</a>";

Regex rx = new Regex("<a[^>]+?>(.*?)</a>");
// Find matches.
MatchCollection matches = rx.Matches(text);

// Report the number of matches found.
Console.WriteLine("{0} matches found. \n", matches.Count);

// Report on each match.
foreach (Match match in matches)
{
    Console.WriteLine(match.Value);

    Console.WriteLine("Groups:");
    foreach (var g in match.Groups)
    {
        Console.WriteLine(g.ToString());
    }
}

Console.ReadLine();

Output:

  1 matches found. 
  <a href=\"link.php\">test</a> 
  Groups:
  <a href=\"link.php\">test</a> 
  test

The match expression in () is stored in the second item of match's Groups collection (the first item is the whole match itself). Each expression in () gets into the Groups collection. See the MSDN for further information.

Simon
You can also replace using the group by index instead of iterating through all results, e.g. Regex.Replace(yourHtml, "<a[^>]+?>(.*?)</a>", "$1"); would return the inner text.
Matt Winckler
Cool, didn't know! :)
Simon
Thanks, this worked.
Steven
A: 

If you had to use Replace, this'd work for simple string content inside the tag:

Regex r = new Regex("<[^>]+>");
string result = r.Replace(@"<a href=""http://www.site.com/""&gt;This is the content.</a>", "");
Console.WriteLine("Result = \"{0}\"", result);

Good luck

David Gladfelter
A: 

You could also use groups in Regex.

For example, the following would give you the content of any tag.

      Regex r = new Regex(@"<a.*>(.*)</a>"); 
      // Regex r = new Regex(@"<.*>(.*)</.*>"); or any kind of tag

        var m = r.Match(@"<a href=""http://www.site.com/""&gt;This is the content.</a>");

        string content = m.Groups[1].Value;

you use groups in regexes by using the parenthesis, although group 0 is the whole match, not just the group.

Francisco Noriega
Your regex won't work in every case. For example if you have <a>test1</a><a>test2</a>... you will get always test2
Simon
Yeah but thats not like the example he used, he made it seem like he is only gonna process 1 un-nested tag at a time
Francisco Noriega