I want to find the text between a pair of <a> tags that link to a given site
Here's the re string that I'm using to find the content:
r'''(<a([^<>]*)href=("|')(http://)?(www\.)?%s([^'"]*)("|')([^<>]*)>([^<]*))</a>''' % our_url
The result will be something like this:
r'''(<a([^<>]*)href=("|')(http://)?(www\.)?stackoverflow.com([^'"]*)("|')([^<>]*)>([^<]*))</a>'''
This is great for most links but it errors with a link with tags within it. I tried changing the final part of the regex from:
([^<]*))</a>'''
to:
(.*))</a>'''
But that just got everything on the page after the link, which I don't want. Are there any suggestions on what I can do to solve this?