tags:

views:

188

answers:

2

hi friends,

I have a text file, its the content from a mail body.it includes html codes.

I want to take only href tags from that text file.I want to do this with asp.net c# web application.

Does any one have a code to help me ...

Thanks

+7  A: 

Try using the Html Agility Pack to parse the HTML from your email and extract the href attributes from <a> tags.

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(emailBody);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
   HtmlAttribute att = link.Attributes["href"];
   string href = att.Value;
}
cxfx
+1  A: 

You could use regular expression even though it is not a perfect solution:

class Program
{
    static void Main(string[] args)
    {
        var text = File.ReadAllText(@"d:\test.htm");

        Regex regex = new Regex("href\\s*=\\s*\"([^\"]*)\"", RegexOptions.IgnoreCase);
        MatchCollection matches = regex.Matches(text);
        foreach(Match match in matches)
        {
            Console.WriteLine(match.Groups[1]);
        }
    }
}
Darin Dimitrov
Nice and quick, not the the regex is perhaps more legible as `@"href\s*=\s*""([^""]*)"""` (or perhaps not, on second thought ;-) and that attributes can have apostrophes as delimiters as well as quotes?
Abel