views:

5423

answers:

8

I'm having trouble writing a regular expression that matches valid IPv6 addresses, including those in their compressed form (with "::" or leading zeros omitted from each byte pair).

Can someone suggest a regular expression that would fulfill the requirement?

I'm considering expanding each byte pair and matching the result with a simpler regex.

+6  A: 

From here,

(\A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,6}\Z)|
(\A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,5}\Z)|
(\A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,4}\Z)|
(\A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,3}\Z)|
(\A([0-9a-f]{1,4}:){1,5}(:[0-9a-f]{1,4}){1,2}\Z)|
(\A([0-9a-f]{1,4}:){1,6}(:[0-9a-f]{1,4}){1,1}\Z)|
(\A(([0-9a-f]{1,4}:){1,7}|:):\Z)|
(\A:(:[0-9a-f]{1,4}){1,7}\Z)|
(\A((([0-9a-f]{1,4}:){6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})\Z)|
(\A(([0-9a-f]{1,4}:){5}[0-9a-f]{1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})\Z)|
(\A([0-9a-f]{1,4}:){5}:[0-9a-f]{1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,3}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,2}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,1}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A(([0-9a-f]{1,4}:){1,5}|:):(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)|
(\A:(:[0-9a-f]{1,4}){1,5}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)
Factor Mystic
Regular expression like this should be a "code smell" that perhaps regular expression are not the best suited solution here. (Although, I guess the op did ask for it...)
Thanatos
+11  A: 

If I may skirt your question, do consider using your networking library's notion of Address to parse and check for errors.

I imagine that at some point you'll want to do something with these addresses, so why not just go straight to the source and make sure that your networking library will understand the address? This is better than just hoping whatever regex is about to be posted here will match your implementation's concept of the address.

In Java we have InetAddress. In .NET we have IPAddress. In .NET, you even have TryParse on the IPAddress class to do this test for you!

bool IsIP6(string addr) {
    IPAddress ip;
    if (IPAddress.TryParse(addr, out ip)) {
        return ip.AddressFamily == AddressFamily.InterNetworkV6;
    }
    else {
        return false;
    }
}
Frank Krueger
That's an excellent suggestion. Unfortunately I'm doing this on an embedded platform and am having some trouble cross-compiling python with IPv6 enabled. I also thought it'd be neat to see whether there was a solution :)
Readonly
+3  A: 

I'd have to strongly second the answer from Frank Krueger.

Whilst you say you need a regular expression to match an IPv6 address, I'm assuming what you really need is to be able to check if a given string is a valid IPv6 address. There is a subtle but important distinction here.

There is more than one way to check if a given string is a valid IPv6 address and regular expression matching is only one solution.

Use an existing library if you can. The library will have fewer bugs and its use will result in less code for you to maintain.

The regular expression suggested by Factor Mystic is long and complex. It most likely works, but you should also consider how you'd cope if it unexpectedly fails. The point I'm trying to make here is that if you can't form a required regular expression yourself you won't be able to easily debug it.

If you have no suitable library it may be better to write your own IPv6 validation routine that doesn't depend on regular expressions. If you write it you understand it and if you understand it you can add comments to explain it so that others can also understand and subsequently maintain it.

Act with caution when using a regular expression whose functionality you can't explain to someone else.

Jon Cram
A: 

If you use Perl try Net::IPv6Addr, or NetAddr::IP

Brad Gilbert
+6  A: 

It sounds like you may be using Python. If so, you can use something like this:

import socket

def check_ipv6(n):
    try:
        socket.inet_pton(socket.AF_INET6, n)
        return True
    except socket.error:
        return False

print check_ipv6('::1') # True
print check_ipv6('foo') # False
print check_ipv6(5)     # TypeError exception
print check_ipv6(None)  # TypeError exception

I don't think you have to have ipv6 compiled in to python to get inet_pton.

(Updated to forward unknown exceptions, such as TypeError)

Joe Hildebrand
You should specify the exception type in the `except` clause. Otherwise, `except` will catch everything and may mask unrelated errors. The type here should be `socket.error`.
Ayman Hourieh
A) inet_pton doesn't throw other exceptions, unless the docs are wrong, and B) even if it did, what else would you return but False?
Joe Hildebrand
Re: other errors... if the user passes in a non-string, TypeError gets eaten. Clearly a list isn't an ipv6, but I'd probably want to have it carp that I was passing in the wrong type.
Gregg Lind
A: 

The regex allows the use of leading zero's in the IPv4 parts.

Some Unix and Mac distro's convert those segments into octals

I suggest using: 25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d as a IPv4 segement.

Aeron
+2  A: 

The following will validate IPv4, IPv6 (full and compressed), and IPv6v4 (full and compressed) addresses:

'/^(?:(?>(?>([a-f0-9]{1,4})(?>:(?1)){7})|(?>(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?))|(?>(?>(?>(?1)(?>:(?1)){5}:)|(?>(?!(?:.*[a-f0-9]:){6,})((?1)(?>:(?1)){0,4})?::(?>(?3):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$/iD'
MichaelRushton
A: 

In java, you can use the library class sun.net.util.IPAddressUtil

IPAddressUtil.isIPv6LiteralAddress(iPaddress);