I'm trying to extract the attributes of a anchor tag (<a>
). So far I have this expression:
(?<name>\b\w+\b)\s*=\s*("(?<value>[^"]*)"|'(?<value>[^']*)'|(?<value>[^"'<> \s]+)\s*)+
which works for strings like
<a href="test.html" class="xyz">
and (single quotes)
<a href='test.html' class="xyz">
but not for string without quotes:
<a href=test.html class=xyz>
How can I modify my regex making it work with attributes without quotes? Or is there a better way to do that?
Thanks!
Update: Thanks for all the good comments and advices so far. There is one thing I didn't mention: I sadly have to patch/modify code not written by myself. And there is no time/money to rewrite this stuff from bottom up.