On POSIX systems you can use inet_pton
and inet_ntop
in combination to do canonicalization. You will still have to do your own CIDR parsing. Fortunately, I believe the only valid CIDR syntax for IPv6 is the /number_of_bits notation, so that's fairly easy.
The other issue you will run into is the lack of support for interface specifications. For link-local addresses, you will see things like %eth0
on the end to specify what link they are local too. getaddrinfo
will parse that but inet_pton
won't.
One strategy you could go for is using getaddrinfo
to parse and inet_ntop
to canonicalize.
getaddrinfo
is available for Windows. inet_pton
and inet_ntop
aren't. Fortunately, it isn't too hard to write code to produce a canonical form IPv6 address. It will require two passes though because the rule for 0 compression is the biggest string of 0s that occurs first. Also IPv4 form (i.e. ::127.0.0.1
) is only used for ::IPv4
or ::ffff:IPv4
.
I have no Windows machine to test with, but from the documentation it appears that Python on Windows supports inet_pton
and inet_ntop
in its socket module.
Writing your own routine for producing a canonical form might not be a bad idea, since even if your canonical form isn't the same as everybody else's, as long as it's valid other people can parse it. But I would under no circumstances write a routine of your own to parse IPv6 addresses.
My advice above is good for Python, C, and C++. I know little or nothing about how to solve this problem in Java or Javascript.
EDIT: I have been examining getaddrinfo and its counterpart, getnameinfo. These are in almost all ways better than inet_pton
and inet_ntop
. They are thread safe, and you can pass them options (AI_NUMERICHOST
in getaddrinfo
's case, and NI_NUMERCHOST
in getnameinfo
's case) to keep them from doing any kind of DNS queries. Their interface is a little complex and reminds me of an ugly Windows interface in some respects, but it's fairly easy to figure out what options to pass to get what you want. I heartily recommend them both.