tags:

views:

367

answers:

4

So I need to match an ipv6 address which may or may not have a mask. Unfortunately I can't just use a library to parse the string.

The mask bit is easy enough, in this case:

(?:\/\d{1,3})?$/

The hard part is the different formats of an ipv6 address. It needs to match ::beef, beef::, beef::beef, etc.

An update: I'm almost there..

/^(\:\:([a-f0-9]{1,4}\:){0,6}?[a-f0-9]{0,4}|[a-f0-9]{1,4}(\:[a-f0-9]{1,4}){0,6}?\:\:|[a-f0-9]{1,4}(\:[a-f0-9]{1,4}){1,6}?\:\:([a-f0-9]{1,4}\:){1,6}?[a-f0-9]{1,4})(\/\d{1,3})?$/i

I am, in this case restricted to using perl's regex.

+1  A: 

Try this:

^([0-9a-fA-F]{4}|0)(\:([0-9a-fA-F]{4}|0)){7}$

From Regular Expression Library: IPv6 address

You should also read this: A Regular Expression for IPv6 Addresses

Rubens Farias
This fails to match 2001:db8:85a3:0:0:8a2e:370:7334 2001:db8:85a3::8a2e:370:7334 2001:0db8:0000:0000:0000::1428:57ab ::ffff:c000:280 and bunches more.
Schwern
+5  A: 

I'm not an IPv6 expert, but please trust me when I tell you that matching (let alone validating) IPv6 addresses is not easy with a very simple regex such as the one you suggest. There's many shorthands and various conventions for combining the address with a port, just to name an example. One such shorthand is that you can write 0:0:0:0:0:0:0:1 as ::1, but there's more. If you read German, I would suggest looking at the slides of Steffen Ullrich's talk at the 11th German Perl Workshop.

You say you can't use a library, but if you're going to reinvent the whole complexity of the library, then you could as well just import it verbatim into your project.

tsee
+9  A: 

What do you mean you can't just use a library? How about a module? Regexp::IPv6 will give you what you need.

innaM
+4  A: 

This contains a patch to Regexp::Common demonstrating a complete, accurate, tested IPv6 regex. Its a straight translation of the IPv6 grammar. Regexp::IPv6 is also accurate.

More importantly, it contains a test suite. Running it with your regex shows you're still a ways off. 10 out of 19 missed. 1 out of 12 false positives. IPv6 contains a lot of special shorthands making it very easy to get subtly wrong.

Best place to read up on what goes into an IPv6 address is RFC 3986 section 3.2.2.

Schwern