views:

132

answers:

2

My Friends,

I really want to extract a simple IP address from a string (actually an one-line html) using Python. But it turns out that 2 hours passed I still couldn't come up with a good solution.

>>> s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>"

-- '165.91.15.131' is what I want!

I tried using regular expression, but so far I can only get to the first number.

>>> import re
>>> ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s )
>>> ip
['165']

In fact, I don't feel I have a firm grasp on reg-expression and the above code was found and modified from elsewhere on the web.

Seek your input and ideas!

+5  A: 

Remove your capturing group:

ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )

Result:

['165.91.15.131']

Notes:

  • If you are parsing HTML it might be a good idea to look at BeautifulSoup.
  • Your regular expression matches some invalid IP addresses such as 0.00.999.9999. This isn't necessarily a problem, but you should be aware of it and possibly handle this situation. You could change the + to {1,3} for a partial fix without making the regular expression overly complex.
Mark Byers
Thanks so much, Mark. This is it!
GoJian
+2  A: 

You can use the following regex to capture only valid IP addresses

re.findall(r'\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\b',s)

returns

['165', '91', '15', '131']
Snehal
Cool. This is a good idea.
GoJian
Technically, this doesn't match valid IP adresses but valid octets. There can be any number of them, which might need to be checked in a separate step.
calmh