I am using beautifuly soup to find all href tags.
links = myhtml.findAll('a', href=re.compile('????'))
I need to find all links that have 'abc123' in the href text.
I need help with the regex , see ??? in my code snippet.
I am using beautifuly soup to find all href tags.
links = myhtml.findAll('a', href=re.compile('????'))
I need to find all links that have 'abc123' in the href text.
I need help with the regex , see ??? in my code snippet.
"abc123"
should give you what you want
if that doesn't work, than BS is probably using re.match
in which case you would want ".*abc123.*"
If you want all the links with exactly 'abc123' you can simply put:
links = myhtml.findAll('a', href=re.compile('abc123'))
If 'abc123'
is literally what you want to search for, anywhere in the href
, then re.compile('abc123')
as suggested by other answers is correct. If the actual string you want to match contains punctuation, e.g. 'abc123.com'
, then use instead
re.compile(re.escape('abc123.com'))
The re.escape
part will "escape" any punctuation so that it's taken literally, just like alphanumerics are; without it, some punctuation gets interpreted in various ways by RE's engine, for example the dot ('.'
) in the above example would be taken as "any single character whatsoever", so re.compile('abc123.com')
would match, e.g. 'abc123zcom'
(and many other strings of a similar nature).