views:

189

answers:

2

Hello. This is my javascript regex pattern:

    url = "http://www.amazon.com/gp";    
    hostname = /^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)/.exec(url) || [];
// would return "www.amazon.com"
  • the above regex extracting the hostname from a given url. I need this line to work using pcre (c++). as you can see, I already added another '\' to each '\' but its still doesn't work.

what are the additional changes I need to do to make it work in pcre code instead of javascript? or maybe it isn't possible and I need to build entirely new pattern to make it work in pcre?

this is a simple version of my code:

int main(void)
{
    string text = "http://www.amazon.com";
    string hostname;
    pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
    if(re.PartialMatch(text, &hostname)) 
    {
        std::cout << "match: " << hostname << "\n";
    }else{
        std::cout << "no match. \n";
    }       
    return 0;
}

Thanks.

+3  A: 

There's no need to convert it, the only thing you have to take care of is the escaping and the / delimiter.

Do note that a regular expression might not be what you want to use here. Or atleast... not like this directly. There are lots of url parsing libraries that are a lot better suited for this task. HTParse for example.

Your C++ code should work but your regex has a lot of optional groups so it's hard to be sure in what group the hostname will end up.

As hacky as it may be, my edit works for this input

string text = "http://www.amazon.com";
string tmp;
string hostname;
pcrecpp::RE re("^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)");
if(re.PartialMatch(text, &tmp, &tmp, &tmp, &tmp, &tmp, &hostname))
{
    std::cout << "match: " << hostname << "\n";
}else{
    std::cout << "no match. \n";
}
WoLpH
what should i do to the "/ delimiter"?
shaimagz
@BillyONeal: that's not correct, in the C++ version of PCRE you don't need delimiters. Also, you usually don't have to use / as a delimiter, most other delimiters will also work.
WoLpH
+1  A: 
"^((\\w+):\\/\\/\\/?)?((\\w+):?(\\w+)?@)?([^\\/\\?:]+):?(\\d+)?(\\/?[^\\?#;\\|]+)?([;\\|])?([^\\?#]+)?\\??([^#]+)?#?(\\w*)"
HaxElit