i have html page with link like /with_us.php?page=digit and out.php?i=digit . how can i get all this links from page, but it will be better if i can collect immediately only digits from this links
This doesn't seem to be specific to the question being asked, and I don't see a reliable way of scraping linked URLs from a page without potentially pulling in comments/text which also contain URLs.
Conspicuous Compiler
2009-08-27 07:23:02
A:
You might want to try actually parsing the page and transversing the DOM.
Chris T
2009-08-27 07:12:02
+3
A:
HTML Agility Pack is ideal for this; this is almost the same as the example on the home page:
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]")
{
string href = link["href"].Value;
}
Now just parse "href"; perhaps something like:
Match match = Regex.Match(href, @"[&?]\w+=(\d+)");
int i;
if (match.Success && int.TryParse(match.Groups[1].Value, out i))
{
Console.WriteLine(i);
}
Marc Gravell
2009-08-27 07:35:34
amm, main question was than i need only link with such template /with_us.php?page=digit for example <a href=out.php?i=1456 target=_blank><b>go</a>but your sample with HTML Agility Pack get ALL links from page. that i asked the question to find immediately only selected links
kusanagi
2009-08-27 07:45:33