views:

490

answers:

5

Hi,

In my program, I have a list of "server address" in the following format:

host[:port]

The brackets here, indicate that the port is optional.

  • host can be a hostname, an IPv4 or IPv6 address (possibly in "bracket-enclosed" notation).
  • port, if present can be a numeric port number or a service string (like: "http" or "ssh").

If port is present and host is an IPv6 address, host must be in "bracket-enclosed" notation (Example: [::1])

Here are some valid examples:

localhost
localhost:11211
127.0.0.1:http
[::1]:11211
::1
[::1]

And an invalid example:

::1:80 // Invalid: Is this the IPv6 address ::1:80 and a default port, or the IPv6 address ::1 and the port 80 ?
::1:http // This is not ambigous, but for simplicity sake, let's consider this is forbidden as well.

My goal is to separate such entries in two parts (obviously host and port). I don't care if either the host or port are invalid as long as they don't contain a non-bracket-enclosed : (290.234.34.34.5 is ok for host, it will be rejected in the next process); I just want to separate the two parts, or if there is no port part, to know it somehow.

I tried to do something with std::stringstream but everything I come up to seems hacky and not really elegant.

How would you do this in C++ ?

I don't mind answers in C but C++ is prefered. Any boost solution is welcome as well.

Thank you.

+7  A: 

Have you looked at boost::spirit? It might be overkill for your task, though.

sbi
Didn't know it existed. Thanks. However, as you just said, seems kind of overkill for my task. I no one else comes up with something more straightforward, I'll surely take a deep look into it.
ereOn
Since the community seems to love this solution, could someone please give me a few guidelines to get started with `boost::spirit` in my specific case ?
ereOn
Bring a towel !
Matthieu M.
@ereOn: Unfortunately, I never had a reason to work with spirit and it never made it to the top of my list of things I would like to play with either, so I cannot give you any advice. AFAIK it's a big and heavy template-meta machinery and might well be overweight for your purpose. Wouldn't regex help here? If not, I'd probably just write a simple parser using string streams. That said, the way that spirit tutorial starts out seems sooo interesting...
sbi
sbi
Don't you know h2g2 ? You should always have a towel with you.
Matthieu M.
@sbi: Urbandictionary defines "Don't forget to bring a towel" as an amazing thing to randomly say. May 25 was towel day in honor of Douglas Adams.
Brian
sbi
Oh, that's just because `ereOn` asked for guidelines to get started with boost::spirit and I remembered my first chaotic tries. It takes a bit to get beyond the `"Hello, world!"` (at least it took a bit for me). You really need to leverage a number of boost libraries to get past the most trivial examples: Phoenix to express the functors, Variant whenever you get unrelated types in a collection, etc... and I spare you the grievous error messages when something is off... I admire the power/cleverness nonetheless :)
Matthieu M.
A: 

If you are getting the port and host via a string or in C++ an array of characters; you could get the length of the string. Do a for loop until the end of the string and go until you find a single colon by itself and the split the string into two parts at that location.

for (int i=0; i<string.length; i++) {
     if (string[i] == ':') {
          if (string[i+1] != ':') {
               if (i > 0) {
                    if (string[i-1] != ':') {
                         splitpoint = i;
}    }    }    }    }

Just a suggestion its kinda deep and I'm sure there is a more efficient way but hope this helps, Gale

GEShafer
Michael Mrozek
Thanks for your answer. But this is even more *hacky* that what I came up to ;) And I'm not sure this handles the `IPv6` special cases.
ereOn
@Michael - Yeah I know you can, however, if you try to do a comparison of string[i-1] at the same time you check if i>0 then you'll throw errors because you can't access string[-1] and I just threw it together =P@ereOn - No problem just thought I'd give the first thing that popped in my mind
GEShafer
Also the `string[i+1]` when `i`'s value is `string.length() - 1` (last loop) resolves to `string[string.length()]` which I believe is out of bounds.
ereOn
Oh... Yup didn't see that lol it would be out of bounds. See, told ya it could be better.
GEShafer
danio
@GESchafer You can do `if (0
Michael Mrozek
A: 

As mentioned, Boost.Spirit.Qi could handle this.

As mentioned, it's overkill (really).

const std::string line = /**/;

if (line.empty()) return;

std::string host, port;

if (line[0] == '[')           // IP V6 detected
{
  const size_t pos = line.find(']');
  if (pos == std::string::npos) return;  // Error handling ?
  host = line.substr(1, pos-1);
  port = line.substr(pos+2);
}
else if (std::count(line.begin(), line.end(), ':') > 1) // IP V6 without port
{
  host = line;
}
else                          // IP V4
{
  const size_t pos = line.find(':');
  host = line.substr(0, pos);
  if (pos != std::string::npos)
    port = line.substr(pos+1);
}

I really don't think this warrants a parsing library, it might not gain in readability because of the overloaded use of :.

Now my solution is certainly not flawless, one could for example wonder about its efficiency... but I really think it's sufficient, and at least you'll not lose the next maintainer, because from experience Qi expressions can be all but clear!

Matthieu M.
Thanks ! Probably not optimal, but definitely readable. However, what happens in case I supply the following string: `"[::1:22"` ?
ereOn
`::1:22` would be considered the host: there is no error handling at all here, you could verify that in the first case, there is a closing bracket `assert(pos != std::string::npos)` or whatever you wish :)
Matthieu M.
does std::string have a function called count()? It gives me errors in VC2008.error C2039: 'count' : is not a member of 'std::basic_string<_Elem,_Traits,_Ax>'
Vite Falcon
I made a function to count the characters in a string and now it gives the wrong result with `"[::1:22"`. I got `Host = ::1:22` and `Port = ::1:22`.
Vite Falcon
@Vite: That should very likely have been `std::count(line.begin(),line.end(),':')`
sbi
Yea. But it still doesn't work. Mainly because it's not trying to eliminate any wrong versions of the 'address'.
Vite Falcon
`@Vite`: Good catch, that's because of there is an overflow of `pos`: `std::string::npos + 2 == 1`. Amusing. Anyway adding in the error handling (ie, stopping if no `]` is found) suppress the issue. Just need to do the same with the IPV4 part.
Matthieu M.
+4  A: 

Here's a simple class that uses boost::xpressive to do the job of verifying the type of IP address and then you can parse the rest to get the results.

Usage:

const std::string ip_address_str = "127.0.0.1:3282";
IpAddress ip_address = IpAddress::Parse(ip_address_str);
std::cout<<"Input String: "<<ip_address_str<<std::endl;
std::cout<<"Address Type: "<<IpAddress::TypeToString(ip_address.getType())<<std::endl;
if (ip_address.getType() != IpAddress::Unknown)
{
    std::cout<<"Host Address: "<<ip_address.getHostAddress()<<std::endl;
    if (ip_address.getPortNumber() != 0)
    {
        std::cout<<"Port Number: "<<ip_address.getPortNumber()<<std::endl;
    }
}

The header file of the class, IpAddress.h

#pragma once
#ifndef __IpAddress_H__
#define __IpAddress_H__


#include <string>

class IpAddress
{
public:
    enum Type
    {
        Unknown,
        IpV4,
        IpV6
    };
    ~IpAddress(void);

    /**
     * \brief   Gets the host address part of the IP address.
     * \author  Abi
     * \date    02/06/2010
     * \return  The host address part of the IP address.
    **/
    const std::string& getHostAddress() const;

    /**
     * \brief   Gets the port number part of the address if any.
     * \author  Abi
     * \date    02/06/2010
     * \return  The port number.
    **/
    unsigned short getPortNumber() const;

    /**
     * \brief   Gets the type of the IP address.
     * \author  Abi
     * \date    02/06/2010
     * \return  The type.
    **/
    IpAddress::Type getType() const;

    /**
     * \fn  static IpAddress Parse(const std::string& ip_address_str)
     *
     * \brief   Parses a given string to an IP address.
     * \author  Abi
     * \date    02/06/2010
     * \param   ip_address_str  The ip address string to be parsed.
     * \return  Returns the parsed IP address. If the IP address is
     *          invalid then the IpAddress instance returned will have its
     *          type set to IpAddress::Unknown
    **/
    static IpAddress Parse(const std::string& ip_address_str);

    /**
     * \brief   Converts the given type to string.
     * \author  Abi
     * \date    02/06/2010
     * \param   address_type    Type of the address to be converted to string.
     * \return  String form of the given address type.
    **/
    static std::string TypeToString(IpAddress::Type address_type);
private:
    IpAddress(void);

    Type m_type;
    std::string m_hostAddress;
    unsigned short m_portNumber;
};

#endif // __IpAddress_H__

The source file for the class, IpAddress.cpp

#include "IpAddress.h"
#include <boost/xpressive/xpressive.hpp>

namespace bxp = boost::xpressive;

static const std::string RegExIpV4_IpFormatHost = "^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:[0-9]{1,5})?$";
static const std::string RegExIpV4_StringHost = "^[A-Za-z0-9]+(\\:[0-9]+)?$";

IpAddress::IpAddress(void)
:m_type(Unknown)
,m_portNumber(0)
{
}

IpAddress::~IpAddress(void)
{
}

IpAddress IpAddress::Parse( const std::string& ip_address_str )
{
    IpAddress ipaddress;
    bxp::sregex ip_regex = bxp::sregex::compile(RegExIpV4_IpFormatHost);
    bxp::sregex str_regex = bxp::sregex::compile(RegExIpV4_StringHost);
    bxp::smatch match;
    if (bxp::regex_match(ip_address_str, match, ip_regex) || bxp::regex_match(ip_address_str, match, str_regex))
    {
        ipaddress.m_type = IpV4;
        // Anything before the last ':' (if any) is the host address
        std::string::size_type colon_index = ip_address_str.find_last_of(':');
        if (std::string::npos == colon_index)
        {
            ipaddress.m_portNumber = 0;
            ipaddress.m_hostAddress = ip_address_str;
        }else{
            ipaddress.m_hostAddress = ip_address_str.substr(0, colon_index);
            ipaddress.m_portNumber = atoi(ip_address_str.substr(colon_index+1).c_str());
        }
    }
    return ipaddress;
}

std::string IpAddress::TypeToString( Type address_type )
{
    std::string result = "Unknown";
    switch(address_type)
    {
    case IpV4:
        result = "IP Address Version 4";
        break;
    case IpV6:
        result = "IP Address Version 6";
        break;
    }
    return result;
}

const std::string& IpAddress::getHostAddress() const
{
    return m_hostAddress;
}

unsigned short IpAddress::getPortNumber() const
{
    return m_portNumber;
}

IpAddress::Type IpAddress::getType() const
{
    return m_type;
}

I have only set the rules for IPv4 because I don't know the proper format for IPv6. But I'm pretty sure it's not hard to implement it. Boost Xpressive is just a template based solution and hence do not require any .lib files to be compiled into your exe, which I believe makes is a plus.

By the way just to break down the format of regex in a nutshell...
^ = start of string
$ = end of string
[] = a group of letters or digits that can appear
[0-9] = any single-digit between 0 and 9
[0-9]+ = one or more digits between 0 and 9
the '.' has a special meaning for regex but since our format has 1 dot in an ip-address format we need to specify that we want a '.' between digits by using '\.'. But since C++ needs an escape sequence for '\' we'll have to use "\\."
? = optional component

So, in short, "^[0-9]+$" represents a regex, which is true for an integer.
"^[0-9]+\.$" means an integer that ends with a '.'
"^[0-9]+\.[0-9]?$" is either an integer that ends with a '.' or a decimal.
For an integer or a real number, the regex would be "^[0-9]+(\.[0-9]*)?$".
RegEx an integer that is between 2 and 3 numbers is "^[0-9]{2,3}$".

Now to break down the format of the ip address:

"^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:[0-9]{1,5})?$"

This is synonymous to: "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+(\:[0-9]{1,5})?$", which means:

[start of string][1-3 digits].[1-3 digits].[1-3 digits].[1-3 digits]<:[1-5 digits]>[end of string]
Where, [] are mandatory and <> are optional

The second RegEx is simpler than this. It's just a combination of a alpha-numeric value followed by an optional colon and port-number.

By the way, if you would like to test out RegEx you can use this site.

Edit: I failed to notice that you optionally had http instead of port number. For that you can change the expression to:

"^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:([0-9]{1,5}|http|ftp|smtp))?$"

This accepts formats like:
127.0.0.1
127.0.0.1:3282
127.0.0.1:http
217.0.0.1:ftp
18.123.2.1:smtp

Vite Falcon
When people have a problem they say: I know, I'll use a regular expression. Now they have 2 problems.
Matthieu M.
LOL. It's not too hard to figure it out. I learned in less than 2 hours. It's not like he doesn't know the format and that he doesn't already have the solution. If I'm right, he already has a solution using std::stringstream and he wants an elegant solution.I'll add a breakdown of regex in the post.
Vite Falcon
Bloated. Regular expressions have always worked just fine for me. Carping by all my lazy coworkers aside.
Jay
Bloated? You mean RegEx or the performance? If it is performance, maybe this might change your mind? http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/boost_xpressive/appendices/perf.html
Vite Falcon
+2  A: 
std::string host, port;
std::string example("[::1]:22");

if (example[0] == '[')
{
    std::string::iterator splitEnd =
        std::find(example.begin() + 1, example.end(), ']');
    host.assign(example.begin(), splitEnd);
    if (splitEnd != example.end()) splitEnd++;
    if (splitEnd != example.end() && *splitEnd == ':')
        port.assign(splitEnd, example.end());
}
else
{
    std::string::iterator splitPoint =
        std::find(example.rbegin(), example.rend(), ':').base();
    if (splitPoint == example.begin())
        host = example;
    else
    {
        host.assign(example.begin(), split_point);
        port.assign(split_point, example.end());
    }
}
Billy ONeal
Matthieu M.
@Matthieu M.: You're right, that should have been a !=.
Billy ONeal
I still don't understand how it could be ':', don't you mean (*(splitEnd++) == ':') ? (though there would be a risk of undefined behavior again).
Matthieu M.
@Matthieu M.: Good point....
Billy ONeal
I may appear... critical... but I am afraid you're still slightly off. Whenever you assign to port, you forget to increment the iterator, thus the first character of port will always be `:` if any. Is this intentional ?
Matthieu M.
@Matthieu M.: You are correct in that I completely botched the ipv6 addresses. But the ipv4 ones are correct. The `base()` member of reverse iterators is one element forward of that reverse iterator.
Billy ONeal
Ah yes you're right! I forgot this detail of implementation!
Matthieu M.