Use []
to match character sets:
$p = "%<a.*\s+name=['\"](.*)['\"]\s*>(?:.*)</a>%im";
Use []
to match character sets:
$p = "%<a.*\s+name=['\"](.*)['\"]\s*>(?:.*)</a>%im";
Try this:
/<a(?:\s+(?!name)[^"'>]+(?:"[^"]*"|'[^']*')?)*\s+name=("[^"]*"|'[^']*')\s*>/im
Here you just have to strip the surrounding quotes:
substr($match[1], 1, -1)
But using a real parser like DOMDocument would be certainly better that this regular expression approach.
James' comment is actually a very popular, but wrong regex used for string matching. It's wrong because it doesn't allow for escaping of the string delimiter. Given that the string delimiter is ' or " the following regex works
$regex = '([\'"])(.*?)(.{0,2})(?<![^\\\]\\\)(\1)';
\1 is the starting delimeter, \2 is the contents (minus 2 characters) and \3 is the last 2 characters and the ending delimiter. This regex allows for escaping of delimiters as long as the escape character is \ and the escape character hasn't been escaped. IE.,
'Valid'
'Valid \' String'
'Invalid ' String'
'Invalid \\' String'
Your current solution won't match anchors with other attributes following 'name' (e.g. <a name="foo" id="foo">
).
Try:
$regex = '%<a\s+\S*\s*name=["']([^"']+)["']%i';
This will extract the contents of the 'name' attribute into the back reference $1
.
The \s*
will also allow for line breaks between attributes.
You don't need to finish off with the rest of the 'a
' tag as the negated character class [^"']+
will be lazy.