tags:

views:

71

answers:

2

Hello,

#include <iostream>
#include <algorithm>
#include<boost/algorithm/string.hpp>
#include<boost/regex.hpp>
  using namespace std;
  using namespace boost;

  string _getBasehtttp(string url)
  {

        regex exrp( "^(?:http://)?([^\\/]+)(.*)$" );

        match_results<string::const_iterator> what;

        if( regex_search( url, what, exrp ) )

        {

            string base( what[1].first, what[1].second );

            return base;
        }
        return "";
 }
int main( ) {

   cout << _getBasehtttp("httpasd://www.google.co.in");
}

if i input http://www.google.co.in i am getting returned as www.google.com but if i input httpasd://www.google.co.in i am getting httpasd ..there should not be any match na y i am getting the match ???

+2  A: 

The http:// doesn't match, but then it's optional, so that's no problem; the "one or more characters that aren't slashes" matches httpasd:, and of course the .* then matches everything that follows, from the slashes (included) onwards. This would work the same way with any common regex implementation, nothing c++ specific about it!

Alex Martelli
can u tell me how can i re right the regex
raj
@raj, that entirely depends on _what_ you're trying to match, and what you're trying **not** to match. I'm glad @Greg was able to divine that (I infer from your acceptance), because my thought-reading abilities are limited;-).
Alex Martelli
A: 

^(?:http://)?([^\\/]+)(.*)$

the ? at the end of (?:http://)? means that bit is optional
this ([^\\/]+) captures and matches anything that is not a \ or /
this (.*) captures everything else up to the end of the line

Perhaps your after something more like ^(?:https?://)([^\\/]+)(.*)$

might like to consider full URL syntax along the lines of

 file://                                        /C:/temp/app/example.html
 file://     C                             :    /temp/app/example.html
 file://     C                             :    \temp\app\example.html
 http://[email protected]:8080/test/url.htm?view=smart
[method][               server                   ][   path   ][optional]
        [user][         domain             ][port]

Then your heading for a regex more like

([a-zA-Z][a-zA-Z0-9\\+\\-\\.]*://)?(([^@/\\\\]+@)?([a-zA-Z0-9_'!~\\-,;&=\\.\\$\\*\\(\\)\\+]+)(:\\d*)?)?([/\\\\][^?]*)?(\\?.*)?
Greg Domjan