tags:

views:

150

answers:

5

I'm wondering if it's possible to compare values in regexps with the regexp system in Python. Matching the pattern of an IP is easy, but each 1-3 digits cannot be above 255 and that's where I'm a bit stumped.

+4  A: 

You need to check the allowed numbers in each position. For the first optional digit, acceptable values are 0-2. For the second, 0-5 (if the first digit for that part is present, otherwise 0-9), and 0-9 for the third.

I found this annotated example at http://www.regular-expressions.info/regexbuddy/ipaccurate.html :

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
Alex JL
I thought about doing something like that, very neat with an explanation though :) Thanks.
dutt
A good example of when *not* to use regexes.
Glenn Maynard
@Glenn Maynard if you already have the IP and need to check if it's valid, I wouldn't use a regex. If you need to extract potential IPs from blocks of other text, a regex would be useful.
Alex JL
It's a little bit of both, I need to use a regex for this, otherwise I need to add things to the framework and bladiblabla.
dutt
+1  A: 

Regex is for pattern matching, but to check for a valid IP, you need to check for the range (i.e. 0 <= n <= 255).

You may use regex to check for range, but that'll be a bit overkill. I think you're better off checking for basic patter and then check for the range for each number.

For example, use the following pattern to match an IP:

([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})

Then check whether each number is within range.

William
A: 

IP addresses can also be checked with split as follows,

all(map((lambda x: 0<=x<=255),map(int,ip.split('.')))) and len(ip.split("."))==4

For me thats a little bit more readable than regex.

Sujoy
huh, kind of a neat solution too. thanks :)
dutt
@sujoy: -2 **FAIL** on '1. 2 .3.4' and **FAIL** on '1.B.2.3' @dutt: be suspicious of anything with 2 * map() and excess parentheses and test before use.
John Machin
@sujoy: The "readable" expression can be reduced to the equivalent `all(map(lambda x: 0<=int(x)<=255,ip.split('.'))) and len(ip.split("."))==4` by removing a map() call and the redundant parentheses around the lambda expression (but still fails, of course)
John Machin
@John Machin, I accept that I didn't test for the possibility of spaces (my bad). But '1.B.2.3' test doesn't fail. I get a value error. `ValueError: invalid literal for int() with base 10: 'B'`
Sujoy
@Sujoy: crashes instead of returning False. That's not a fail???
John Machin
+5  A: 

You can check a 4-octet IP address easily without regexes at all. Here's a tested working method:

>>> def valid_ip(ip):
...    parts = ip.split('.')
...    return (
...        len(parts) == 4
...        and all(part.isdigit() for part in parts)
...        and all(0 <= int(part) <= 255 for part in parts)
...        )
...
>>> valid_ip('1.2.3.4')
True
>>> valid_ip('1.2.3.4.5')
False
>>> valid_ip('1.2.   3   .4.5')
False
>>> valid_ip('1.256.3.4.5')
False
>>> valid_ip('1.B.3.4')
False
>>>
John Machin
+4  A: 

No need for regular expressions here. Some background:

>>> import socket
>>> socket.inet_aton('255.255.255.255')
'\xff\xff\xff\xff'
>>> socket.inet_aton('255.255.255.256')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
error: illegal IP address string passed to inet_aton
>>> socket.inet_aton('my name is nobody')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
error: illegal IP address string passed to inet_aton

So:

import socket

def ip_address_is_valid(address):
    try: socket.inet_aton(address)
    except socket.error: return False
    else: return True

Note that addresses like '127.1' could be acceptable on your machine (there are systems, including MS Windows and Linux, where missing octets are interpreted as zero, so '127.1' is equivalent to '127.0.0.1', and '10.1.4' is equivalent to '10.1.0.4'). Should you require that there are always 4 octets, change the last line from:

else: return True

into:

else: return address.count('.') == 3
ΤΖΩΤΖΙΟΥ
I thought that zero-filled was only the case with IPv6 addresses and the :: notation? It is frequently the case that when writing IP addresses with a CIDR mask that only the octets touched by the mask are written (10/8, 172.16/12, 192.168/16, 192.168.127/24), with a "zero-fill" on the remaining octets. But eliding zero-octets in the middle I've never seen with IPv4.
Vatine
@Vatine: I assume you: a) downvoted my answer because you thought I was stating something mistaken, while I was stating something you didn't know b) you've never pinged 127.1 yourself; please, do try it. Not only I am mistaken, but the creator of your IP stack is mistaken too. We apologize.
ΤΖΩΤΖΙΟΥ
@ΤΖΩΤΖΙΟΥ: My ping expands 127.1 into 127.0.0.1 but it was concocted in the Dark Tower of Redmond where adherence to standards is sometimes a little casual so that proves nuffin :-)
John Machin
@John: of course it proves nuffin :) *My* ping —which was concocted in the Evil Forces of \*nix— together with *your* ping just assert that 127.1 is *acceptable* …
ΤΖΩΤΖΙΟΥ
Just because the implied empty fields are accepted by some systems does not mean they are correct. That is like saying a single 32bit integer, expressed in decimal is valid, or that hex should be accepted (because some interfaces accept hex for the quads values.
benc
@benc: no-one said “correct”; I said “acceptable”. I don't know which systems **don't** accept '127.1'; your “some known” might be “all known systems” for what it's worth. The question was about the validity of an IPv4 address, and the *validity* is defined by the software that uses the IPv4 address (whether it follows standards or not). If the only application mentioned is Python, then I report what is valid for Python, at least. Python **and** GNU ping **and** Windows ping **and** Firefox accept '127.0.0.1', '127.1', '2130706433', '0x7f000001' as equivalent. What was your point exactly?
ΤΖΩΤΖΙΟΥ
@benc: I'm asking what was your point, because I don't see one, since my answer already has a clause to verify that the input `address` is a dotted address with four octets, as the question implies that it's needed.
ΤΖΩΤΖΙΟΥ
Just tried pinging it, this is what the prompt came back with: "Name (127.1) is not a valid IP Address."
Vatine
In most situations, people ask the for an implementation of an IPv4 address validator in a specific language. They rarely ask: "What kinds of address magic works in this particular language?
benc
ΤΖΩΤΖΙΟΥ: Also, I should mention that often this magic is platform specific, so it might work in one place and not another. And the data might flow to other places that don't work. 2130706433 might work in Firefox, but fail in a proxy it points to.
benc
@ΤΖΩΤΖΙΟΥ: So far, I have tested it in IOS and ExtremeOS, neither understand 127.1 as a valid IP address and recommending non-universal parsing results is Not Recommended.
Vatine
@Vatine: so I assume you're a Mac OS/X user: it's a fact that, although OS/X is based on BSD, the code for parsing IP addresses has been rewritten, so maybe even Python on your system won't accept '127.1'. If you're not solely using Mac OS/X (especially since you've been a netadm), I'm interested in other systems too. However, I'd like to state here what irritated me most about your response: no-one knows everything, I'm sure you'll agree; given that, I strongly believe that dismissing something outside one's experience as false is arrogant and narrow-minded.
ΤΖΩΤΖΙΟΥ
@Vatine: oh, come on! Are you intentionally putting words in my mouth? I did **not** recommend non-universal parsing results, and I never said it's **correct** (@benc); I said “acceptable” because Python (in the scope of the question) accepts alternative IPv4 addresses on MS Windows and on Linux, and that is a **fact**. And my answer **already** says: “should you require that there are always four octets”, so I got you covered. How is that answer not useful or incorrect?
ΤΖΩΤΖΙΟΥ
I tried to get something of value out of this otherwise pointless discussion, and modified the text of my answer to make it unambiguous.
ΤΖΩΤΖΙΟΥ
@benc: your objections would be valid if I *suggested* that one writes '127.1' instead of '127.0.0.1'. I don't know the scope and the intended use of IPv4 addresses by the OP, I only knew Python. Although my initial answer was correct, it was **not** unambiguous; I modified the text of the answer accordingly to acknowledge that fact. I welcome any further objections to what I *wrote*, but I can't respond to any objections to what one may *think* that I *meant*.
ΤΖΩΤΖΙΟΥ