ansaurus

Question

Regex to get the link in href. [asp.net]

Answer 1

A:

The following example searches an input string and prints out all the href="…" values and their locations in the string. It does this by constructing a compiled Regex object and then using a Match object to iterate through all the matches in the string. In this example, the metacharacter \s matches any space character, and \S matches any nonspace character.

' VB

Sub DumpHrefs(inputString As String)

Dim r As Regex
Dim m As Match

r = New Regex("href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))", _
    RegexOptions.IgnoreCase Or RegexOptions.Compiled)

m = r.Match(inputString)
While m.Success
    Console.WriteLine("Found href " & m.Groups(1).Value _
        & " at " & m.Groups(1).Index.ToString())
    m = m.NextMatch()
End While

End Sub

// C#

void DumpHrefs(String inputString) {

Regex r;
Match m;

r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
    RegexOptions.IgnoreCase|RegexOptions.Compiled);
for (m = r.Match(inputString); m.Success; m = m.NextMatch())
{
    Console.WriteLine("Found href " + m.Groups[1] + " at "
        + m.Groups[1].Index);
}

}

Romina 2009-09-30 08:02:48

that dont work for me it gets <a href="my link">link

Dejan.S 2009-09-30 08:14:48

and that was from <a href="http://www.link.com">link</a> so what it did was removed the </a>.. i need to get the http://www.link.com

Dejan.S 2009-09-30 08:16:45

Answer 2

A:

Second regular expression should be:

href=['"](?<link>[^'"]*)

Mijalko 2009-09-30 08:09:08

it is closer but I get href='http://www.link.com with that Mijalko

Dejan.S 2009-09-30 08:23:09

href='http://www.link.com

Dejan.S 2009-09-30 08:23:56

well you get it, it is supose to be http://www.

Dejan.S 2009-09-30 08:24:38

Answer 3

+4 A:

Welcome to your daily installment of Don't Use Regex To Parse HTML. In this edition of Don't Use Regex To Parse HTML, we'll be reminding you not to use regex to parse HTML because HTML cannot reliably be parsed by a regex and dozens of valid HTML constructs will break the naïve regex proposed. We won't be mentioning all the additional invalid ones in common use on the web in Don't Use Regex To Parse HTML today.

Also in Don't Use Regex To Parse HTML, we'll be linking to the Html Agility Pack, a .NET library you can use to parse HTML properly and subsequently extract link URLs reliably in just a couple of lines of code (a very similar example being present on that page).

We hope you have enjoyed today's Don't Use Regex To Parse HTML, and look forward to seeing you again tomorrow for another exciting edition of Don't Use Regex To Parse HTML, when someone posts another question about using regex to parse HTML. But that's all from Don't Use Regex To Parse HTML for now. Bye!

bobince 2009-09-30 08:36:02

Was this a canned response you already used somewhere else or you wrote it explicitly? (+1)

Paolo Tedesco 2009-09-30 08:38:31

Is it alright to rate answers +1 based on humor (assuming they're correct?) If not, consider me a rebel against the system!

Duroth 2009-09-30 08:40:28

solved my issue with a regex

Dejan.S 2009-09-30 09:01:37

ansaurus

tags:

views:

answers:

Regex to get the link in href. [asp.net]

related questions