tags:

views:

152

answers:

3

Can anyone help me by explaining how to extract urls/links from HTML File in C#

+5  A: 

look at Html Agility Pack

HtmlDocument doc = new HtmlDocument(); 
doc.Load("file.htm");  
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"]) 
{
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);  
}  
doc.Save("file.htm");
Sergey Mirvoda
Do this. Parsing HTML with RegEx can be a very tedious task, Html Agility Pack will save you a lot of time.
Nathan Taylor
Agreed, HTML Agility pack is the way to go.
Dan Diplo
One up for the Html Agility pack!
thijs
A: 

Another option is to use a regular expression to extract urls described here

Ben
A: 

You can use an HTQL COM object and query the page using query: <a>:href

seagulf