tags:

views:

441

answers:

5
+5  Q: 

Python and "re"

A tutorial I have on Regex in python explains how to use the re module in python, I wanted to grab the URL out of an A tag so knowing Regex I wrote the correct expression and tested it in my regex testing app of choice and ensured it worked. When placed into python it failed.

After much head scratching I found out the issue, it automatically expects your pattern to be at the start of the string. I have found a fix but I would like to know how to change:

regex = ".*(a_regex_of_pure_awesomeness)"

into

regex = "a_regex_of_pure_awesomeness"

Okay, it's a standard URL regex but I wanted to avoid any potential confusion about what I wanted to get rid of and possibly pretend to be funny.

+1  A: 

Are you using the re.match() or re.search() method? My understanding is that re.match() assumes a "^" at the begining of your expression and will only search at the beginning of the text, while re.search() acts more like the Perl regular expressions and will only match the beginning of the text if you include a "^" at the beginning of your expression. Hope that helps.

+16  A: 

In Python, there's a distinction between "match" and "search"; match only looks for the pattern at the start of the string, and search looks for the pattern starting at any location within the string.

Python regex docs
Matching vs searching

zweiterlinde
+3  A: 
>>> import re
>>> pattern = re.compile("url")
>>> string = "   url"
>>> pattern.match(string)
>>> pattern.search(string)
<_sre.SRE_Match object at 0xb7f7a6e8>
Aaron Maenpaa
+1  A: 

You are probably being tripped up by the different methods re.search and re.match.

mmaibaum
+4  A: 
from BeautifulSoup import BeautifulSoup 

soup = BeautifulSoup(your_html)
for a in soup.findAll('a', href=True):
    # do something with `a` w/ href attribute
    print a['href']
J.F. Sebastian