tags:

views:

259

answers:

7
public class MyExample
{

    public static void Main(String[] args)
    {


string input = "<a href=\"http://tvrss.net/search/?show_name=The+Venture+Bros&amp;amp;show_name_exact=true\"&gt;The Venture Bros</a></p></li>";


    // Call Regex.Match
    Match m = Regex.Match(input, "/show_name=(.*?)&amp;show_name_exact=true\">(.*?)</i");

   // Check Match instance
    if (m.Success)
    {
        // Get Group value
        string key = m.Groups[1].Value;
        Console.WriteLine(key);
        // alternate-1
    }



    }

I want "The Venture Bros" as output (in this example).

A: 

First the regex starts "/show_name", but the target string has "/?show_name" so the first group won't want the first expected hit.

This will cause the whole regex to fail.

Richard
+1  A: 

Because of the question mark before show_name. It is in input but not in pattern, thus no match.

Also, you try to match </i but the input doesn't contain this (it contains </li>).

Brian Rasmussen
I am tried this:string input = "show_name=The+Venture+Bros...Match m = Regex.Match(input, "/show_name=(.*?)So without the question mark in the input, but doesnt helped. Why?
See updated answer.
Brian Rasmussen
+2  A: 

I think it's because you're trying to do the perl-style slashes on the front and the end. A couple of other answerers have been confused by this already. The way he's written it, he's trying to do case-insensitive by starting and ending with / and putting an i on the end, the way you'd do it in perl.

But I'm pretty sure that .NET regexes don't work that way, and that's what's causing the problem.

Edit: to be more specific, look into RegexOptions, an example I pulled from MSDN is like this:

Dim rx As New Regex("\b(?<word>\w+)\s+(\k<word>)\b", RegexOptions.Compiled Or RegexOptions.IgnoreCase)

The key there is the "RegexOptions.IgnoreCase", that'll cause the effect that you were trying for with /pattern/i.

Chad Birch
+1  A: 

try this :

string input = "<a href=\"http://tvrss.net/search/?show_name=The+Venture+Bros&amp;amp;show_name_exact=true\"&gt;The Venture Bros</a></p></li>";

// Call Regex.Match
Match m = Regex.Match(input, "show_name=(.*?)&amp;show_name_exact=true\">(.*?)</a");

// Check Match instance
if (m.Success)
{
    // Get Group value
    string key = m.Groups[2].Value;
    Console.WriteLine(key);
    // alternate-1
}
Canavar
In other words, change the regex modifier from /i to /a, and take the third element of the match array instead of the second ("string key = m.Groups[2].Value"). Some explanation as to why this is better would increase the helpfulness of this answer, too.
Svante
No, that's not what he's doing. Exactly like I said, a lot of people are confused about the regex modifiers, they don't work like that in C#. He's not "changing the regex modifier", he's putting a literal "/a" on the end.
Chad Birch
@Harleqin : code tells more then explanation. I read your comment 2 times and understand what you mean. But the code is obvious, You can see what you need to change.
Canavar
+1  A: 

The correct regex in your case would be

^.*&amp;show_name_exact=true\"\>(.*)</a></p></li>$

regexp is tricky, but at http://www.regular-expressions.info/ you can find a great tutorial

rhapsodhy
+1  A: 

/?show_name=(.*)&show_name_exact=true\">(.*)

would work as you expect I believe. But another thing I notice, is that you're trying to get the value of group[1], but I believe that you want the value of group[2], because there will be 3 groups, the first is the match, and the second is the first group...

Gl ;)

Bruno Costa
Yeah, I think that a lot of people get confused and forget that group[0] is the entire string that is matched.
Kibbee
A: 

Ok, let's break this down.

Test Data: "<a href=\"http://tvrss.net/search/?show_name=The+Venture+Bros&amp;amp;show_name_exact=true\"&gt;The Venture Bros</a></p></li>"

Original Regex: "/show_name=(.*?)&amp;show_name_exact=true\">(.*?)</i"

Working Regex: "/\?show_name=(.*)&amp;show_name_exact=true\">(.*)</a"

We'll start at the left and work our way to the right, through the regex.

  1. "?" became "\?" this is because a "?" means that the preceding character or group is optional. When we put a slash before it, it now matches a literal question mark.

  2. "(.*?)" became "(.*)" the parentheses denote a group, and a question mark means "optional", but the "*" already means "0 or more" so this is really just removing a redundancy.

  3. "</i" became "</a" this change was made to match your actual text which terminates the anchor with a "</a>" tag.

Suggested Regex: "[\\W]show_name=([^><\"]*)&amp;show_name_exact=true\">([^<]*)<"

(The extra \'s were added to provide proper c# string escaping.)

A good tool for testing regular expressions in c#, is the regex-freetool at code.google.com

Mazrick