ansaurus

Question

pyton regex to find any link that contains the text 'abc123'

Answer 1

+1 A:

"abc123" should give you what you want

if that doesn't work, than BS is probably using re.match in which case you would want ".*abc123.*"

aaronasterling 2010-08-07 01:53:43

but it is any href that contains 'abc123' anywhere in the href url.

Blankman 2010-08-07 01:54:58

i've updated my answer

aaronasterling 2010-08-07 01:58:16

Answer 2

+1 A:

If you want all the links with exactly 'abc123' you can simply put:

links = myhtml.findAll('a', href=re.compile('abc123'))

Rui Vieira 2010-08-07 01:55:45

Answer 3

+2 A:

If 'abc123' is literally what you want to search for, anywhere in the href, then re.compile('abc123') as suggested by other answers is correct. If the actual string you want to match contains punctuation, e.g. 'abc123.com', then use instead

re.compile(re.escape('abc123.com'))

The re.escape part will "escape" any punctuation so that it's taken literally, just like alphanumerics are; without it, some punctuation gets interpreted in various ways by RE's engine, for example the dot ('.') in the above example would be taken as "any single character whatsoever", so re.compile('abc123.com') would match, e.g. 'abc123zcom' (and many other strings of a similar nature).

Alex Martelli 2010-08-07 02:26:12

ansaurus

tags:

views:

answers:

pyton regex to find any link that contains the text 'abc123'

related questions