ansaurus

Question

Answer 1

+2 A:

There is no need to use Regex to try to parse HTML when there is the fantastic library called HTML Agility Pack. This library makes light work of finding the links and it will correctly handle special cases where your regular expression will fail. You will get a more robust solution with less effort involved.

This example code demonstrating use of the library is written in C#, but hopefully it will help you to build a solution in VB.NET:

HtmlDocument doc  = new HtmlDocument();
doc.Load("input.html");
foreach (var link in doc.DocumentNode.Descendants("a"))
{
    string href = link.Attributes["href"].Value;
    Match match = Regex.Match(href, "^http://(?&lt;Link&gt;.*?)/profile$");
    if (match.Success)
    {
        Console.WriteLine(match.Groups["Link"].Value);
    }
}

Mark Byers 2010-03-15 20:31:58

Thank you very much for your response, will look into applying this in my future programs instead of regex

vbNewbie 2010-03-16 14:24:01

Answer 2

+1 A:

You may need to add RegexOptions.SingleLine. From the docs:

Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).

Adam Ruth 2010-03-15 20:39:39

ansaurus

tags:

views:

answers:

matching repeated group using regex

related questions