tags:

views:

197

answers:

2

I have this simple piece of code in c++:

int main(void)
    {
        string text = "http://www.amazon.com";
        string a,b,c,d,e,f;
        pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
        if(re.PartialMatch(text, &a,&b,&c,&d,&e,&f)) 
        {
            std::cout << "match: " << f << "\n";
            // should print "www.amazon.com"
        }else{
            std::cout << "no match. \n";
        }       
        return 0;
    }

When I run this it doesn't find a match. I pretty sure that the regex pattern is correct and my code is what's wrong. If anyone familiar with pcrecpp can take a look at this Ill be grateful.

EDIT: Thanks to Dingo, it works great.
another issue I had is that the result was at the sixth place - "f".
I edited the code above so you can copy/paste if you wish.

+1  A: 

Please do cout << re.pattern() << endl; to double-check that all your double-slashing is done right (and also post the result).

Looks like

^((\w+):\/\/\/?)?((\w+):?(\w+)?@)?([^\/\?:]+):?(\d+)?(\/?[^\?#;\|]+)?([;\|])?([^\?#]+)?\??([^#]+)?#?(\w*)

The hostname isn't going to be returned from the first capture group, why are you using parentheses around for example \w+ that you aren't wanting to capture?

Ben Voigt
+1  A: 

The problem is that your code contains ??( which is a trigraph in C++ for [. You'll either need to disable trigraphs or do something to break them up like:

pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??" "([^#]+)?#?(\\w*)"); 
Dingo