tags:

views:

109

answers:

8

What is the fastest way to check if a string matches a certain pattern? Is regex the best way?

For example, I have a bunch of strings and want to check each one to see if they are a valid IP address (valid in this case meaning correct format), is the fastest way to do this using regex? Or is there something faster with like string formatting or something.

Something like this is what I have been doing so far:

for st in strs:
    if re.match('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', st) != None:
       print 'IP!'
A: 

You can use regular expressions: http://www.regular-expressions.info/python.html

theycallmemorty
OP has been using a regex.
kchau
He already is doing that.
Mark Byers
+10  A: 

It looks like you are trying to validate IP addresses. A regular expression is probably not the best tool for this.

If you want to accept all valid IP addresses (including some addresses that you probably didn't even know were valid) then you can use IPy (Source):

from IPy import IP
IP('127.0.0.1')

If the IP address is invalid it will throw an exception.

Or you could use socket (Source):

import socket
try:
    socket.inet_aton(addr)
    # legal
except socket.error:
    # Not legal

If you really want to only match IPv4 with 4 decimal parts then you can split on dot and test that each part is an integer between 0 and 255.

def validate_ip(s):
    a = s.split('.')
    if len(a) != 4:
        return False
    for x in a:
        if not x.isdigit():
            return False
        i = int(x)
        if i < 0 or i > 255:
            return False
    return True

Note that your regular expression doesn't do this extra check. It would accept 999.999.999.999 as a valid address.

Mark Byers
Hmmm this is perfect. Yea I did not think about integers greater than 255.
Tommy Morene
Not all IP address are in decimal.
Ashish
Accepting this for the IPy. I ended up using IPy partly because of @Alex's IPv6 point.
Tommy Morene
+1  A: 

Your regular expression doesn't check for the end of the string, so it would match:

123.45.67.89abc123boogabooga

To fix this, use:

'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$'

(note the $ at the end).

Finally, in Python the usual style is to use is not None instead of != None.

Greg Hewgill
A: 

You can make it a little faster by compiling it:

expression = re.compile('^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$')
for st in strs:
    if expression.match(st):
       print 'IP!'
Matt Williamson
+1  A: 

you should precompile the regexp, if you use it repeatedly

re_ip = re.compile('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$')
# note the terminating $ to really match only the IPs

then use

if re_ip.match(st):
    print '!IP'

but.. is e.g. '111.222.333.444' really the IP?

i'd look at netaddr or ipaddr libraries whether they can be used to match IPs

mykhal
+6  A: 

I'm normally the one of the very few Python experts who steadfastly defends regular expressions (they have quite a bad reputation in the Python community), but this is not one of those cases -- accepting (say) '333.444.555.666' as an "IP address" is really bad, and if you need to do more checks after matching the RE, much of the point of using a RE is lost anyway. So, I second @Mark's recommendations heartily: IPy for generality and elegance (including support of IPv6 if you want!), string operations and int checks if you only need IPv4 (but, think twice about that limitation, and then think one more -- IPv6's time has way come!-):

def isgoodipv4(s):
    pieces = s.split('.')
    if len(pieces) != 4: return False
    try: return all(0<=int(p)<256 for p in pieces)
    except ValueError: return False

I'd far rather do that than a convoluted RE to match only numbers between 0 and 256!-)

Alex Martelli
+1 for use of `a<=x<b` and other things that make it a bit cleaner than my attempt.
Mark Byers
+1  A: 

If you are validating IP address I would suggest the following:

import socket

try:
    socket.inet_aton(addr)
    return True
except socket.error:
    return False

If you just want to check if it is in the right format then you would want to do it for all legal bases (not just base 10 numbering).

Also, are the IP address IPv4 only (and none are IPv6) then you could just look up what valid address are and use split() (to get individual components of the IP) and int() (to type-caste for comparison). A quick reference to valid IPv4 rules is here.

Ashish
A: 

One more validation without re:

def validip(ip):
    return ip.count('.') == 3 and  all(0<=int(num)<256 for num in ip.rstrip().split('.'))

for i in ('123.233.42.12','3234.23.453.353','-2.23.24.234','1.2.3.4'):
    print i,validip(i)
Tony Veijalainen