tags:

views:

505

answers:

2

I'm using a regex that strips the href tags out of an html doc saved to a string. The following code is how I'm using it in my C# console app.

Match m = Regex.Match(htmlSourceString, "href=[\\\"\\\'](http:\\/\\/|\\.\\/|\\/)?\\w+(\\.\\w+)*(\\/\\w+(\\.\\w+)?)*(\\/|\\?\\w*=\\w*(&\\w*=\\w*)*)?[\\\"\\\']");

        if (m.Success)
        {
            Console.WriteLine("values = " + m);
        }

However, it only returns one result, instead of a list of all the href tags on the html page. I know it works, because when I trying RegexOptions.RightToLeft, it returns the last href tag in the string.

Is there something with my if statement that doesn't allow me to return all the results?

+15  A: 

Match method searches for the first occurence of the string, Matches method searches for all occurences.

arul
Just wondering, I have to change `Match m` to `MatchCollections m` to use the Regex.Matches() method, but then it says `MatchCollections` doesn't have a definition for `m.success`. Is there something I'm missing?
Matt S
@Matt S - it will have a length > 0 if there are matches
Simon_Weaver
+2  A: 

If you use Match instead of Match**es** you need to use a loop to get all the matches calling m.NextMatch() at the end of each loop. For example:

    Match m = Regex.Match(htmlSourceString, "href=[\\\"\\\'](http:\\/\\/|\\.\\/|\\/)?\\w+(\\.\\w+)*(\\/\\w+(\\.\\w+)?)*(\\/|\\?\\w*=\\w*(&\\w*=\\w*)*)?[\\\"\\\']");
    Console.Write("values = ");
    while (m.Success) 
    { 
        Console.Write(m.Value);
        Console.Write(", "); // Delimiter
        m = m.NextMatch();
    }
    Console.WriteLine();
Martin Brown
Thanks a lot, Martin! This is the method I ended up using.
Matt S