views:

47

answers:

3

I am using beautifuly soup to find all href tags.

links = myhtml.findAll('a', href=re.compile('????'))

I need to find all links that have 'abc123' in the href text.

I need help with the regex , see ??? in my code snippet.

+1  A: 

"abc123" should give you what you want

if that doesn't work, than BS is probably using re.match in which case you would want ".*abc123.*"

aaronasterling
but it is any href that contains 'abc123' anywhere in the href url.
Blankman
i've updated my answer
aaronasterling
+1  A: 

If you want all the links with exactly 'abc123' you can simply put:

links = myhtml.findAll('a', href=re.compile('abc123'))
Rui Vieira
+2  A: 

If 'abc123' is literally what you want to search for, anywhere in the href, then re.compile('abc123') as suggested by other answers is correct. If the actual string you want to match contains punctuation, e.g. 'abc123.com', then use instead

re.compile(re.escape('abc123.com'))

The re.escape part will "escape" any punctuation so that it's taken literally, just like alphanumerics are; without it, some punctuation gets interpreted in various ways by RE's engine, for example the dot ('.') in the above example would be taken as "any single character whatsoever", so re.compile('abc123.com') would match, e.g. 'abc123zcom' (and many other strings of a similar nature).

Alex Martelli