Can anyone help me by explaining how to extract urls/links from HTML File in C#
+5
A:
look at Html Agility Pack
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
Sergey Mirvoda
2010-02-25 17:24:50
Do this. Parsing HTML with RegEx can be a very tedious task, Html Agility Pack will save you a lot of time.
Nathan Taylor
2010-02-25 17:35:59
Agreed, HTML Agility pack is the way to go.
Dan Diplo
2010-02-26 08:38:56
One up for the Html Agility pack!
thijs
2010-02-26 09:05:18
A:
You can use an HTQL COM object and query the page using query: <a>:href
seagulf
2010-05-10 14:20:58