ansaurus

Question

Answer 1

+1 A:

The basic approach would be:

Use urllib2 to download the contents of the page
Use a regular expression to extract IPv4-like addresses
Validate each match according to the numeric constraints on each octet
Print out the list of matches

Please provide a clearer indication of what specific part you are having trouble with, along with evidence to show what it is you've tried thus far.

Rob 2009-04-30 21:01:45

Right. The only part I cant do is the regular expression one.

2009-04-30 21:03:45

If someone shows me that, I will be fine.

2009-04-30 21:04:13

Answer 2

A:

Try:

re.compile("\d?\d?\d.\d?\d?\d.\d?\d?\d.\d?\d?\d:\d+").findall(urllib2.urlopen(url).read())

Matthew Wilkes 2009-04-30 21:12:25

Needs to be coupled with something to constrain the acceptable octet values.

Rob 2009-04-30 22:03:38

Answer 3

+3 A:

Right. The only part I cant do is the regular expression one. – das 9 mins ago If someone shows me that, I will be fine. – das 8 mins ago

import re

ip = re.compile(r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?):\d{1,6}\b")
junk = " 1.1.1.1:123 2.2.2.2:321 312.123.1.12:123 "
print ip.findall(junk)

# outputs ['1.1.1.1:123', '2.2.2.2:321']

Here is a complete example:

import re, urllib2

f = urllib2.urlopen("http://www.samair.ru/proxy/ip-address-01.htm")
junk = f.read()

ip = re.compile(r"\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?):\d{1,6}\b")
print ip.findall(junk)

# ['114.30.47.10:80', '118.228.148.83:80', '119.70.40.101:8080', '12.47.164.114:8888', '121.
# 17.161.114:3128', '122.152.183.103:80', '122.224.171.91:3128', '123.234.32.27:8080', '124.
# 107.85.115:80', '124.247.222.66:6588', '125.76.228.201:808', '128.112.139.75:3128', '128.2
# 08.004.197:3128', '128.233.252.11:3124', '128.233.252.12:3124']

Unknown 2009-04-30 21:14:43

Gross! But well done none the less. Is that tested?

Ross 2009-04-30 21:17:05

Yes it is. Try it yourself.

Unknown 2009-04-30 21:17:46

In [9]: ip.search("255x255x255x255:12")Out[9]: <_sre.SRE_Match object at 0x103c650>

llimllib 2009-04-30 21:25:28

@ llimllib: >>> ip.match("255x255x255x255:12")>>>

Unknown 2009-04-30 21:26:36

well now you escaped your periods and it works. nice work!

llimllib 2009-04-30 22:04:40

Answer 4

A:

Not to turn this into a who's-a-better-regex-author-war but...

(\d{1,3}\.){3}\d{1,3}\:\d{1,6}

Ross 2009-04-30 21:15:45

You can't have IPs with zeroes?

llimllib 2009-04-30 21:43:38

Needs to be coupled with something to constrain the acceptable octet values.

Rob 2009-04-30 22:03:55

"\d{1,3}" simply means 1-3 digits. A couple already submitted more "correct" regexes already, so it's a moot point.

Ross 2009-05-01 14:09:45

wow, retarded comment on my part. I apologize.

llimllib 2009-05-01 22:51:52

ansaurus

tags:

views:

answers:

Find all IPs on an HTML Page

related questions