Hi Everyone,
I am new to Python (I dont have any programming training either), so please keep that in mind as I ask my question.
I am trying to search a retrieved webpage and find all links using a specified pattern. I have done this successfully in other scripts, but I am getting an error that says
raise error, v # invalid expression
sre_constants.error: multiple repeat
I have to admit I do not know why, but again, I am new to Python and Regular Expressions. However, even when I don't use patterns and use a specific link (just to test the matching), I do not believe I return any matches (nothing is sent to the window when I print match.group(0). The link I tested is commented out below.
Any ideas? It usually is easier for me to learn by example, but any advice you can give is greatly appreciated!
Brock
import urllib2
from BeautifulSoup import BeautifulSoup
import re
url = "http://forums.epicgames.com/archive/index.php?f-356-p-164.html"
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
pattern = r'<a href="http://forums.epicgames.com/archive/index.php?t-([0-9]+).html">(.?+)</a> <i>((.?+) replies)'
#pattern = r'href="http://forums.epicgames.com/archive/index.php?t-622233.html">Gears of War 2: Horde Gameplay</a> <i>(20 replies)'
for match in re.finditer(pattern, page, re.S):
print match(0)