+1  A: 

Use [] to match character sets:

$p = "%<a.*\s+name=['\"](.*)['\"]\s*>(?:.*)</a>%im";
James Emerton
+1  A: 

Try this:

/<a(?:\s+(?!name)[^"'>]+(?:"[^"]*"|'[^']*')?)*\s+name=("[^"]*"|'[^']*')\s*>/im

Here you just have to strip the surrounding quotes:

substr($match[1], 1, -1)

But using a real parser like DOMDocument would be certainly better that this regular expression approach.

Gumbo
Excellent! Works like a charm! Thank you very much!
jerrygarciuh
Better use php's built-in DOMDocument + SimpleXML or DOMXPath (it depends...)
Jet
+1  A: 

James' comment is actually a very popular, but wrong regex used for string matching. It's wrong because it doesn't allow for escaping of the string delimiter. Given that the string delimiter is ' or " the following regex works

$regex = '([\'"])(.*?)(.{0,2})(?<![^\\\]\\\)(\1)';

\1 is the starting delimeter, \2 is the contents (minus 2 characters) and \3 is the last 2 characters and the ending delimiter. This regex allows for escaping of delimiters as long as the escape character is \ and the escape character hasn't been escaped. IE.,

'Valid'
'Valid \' String'
'Invalid ' String'
'Invalid \\' String'
Shawn Biddle
+1  A: 

Your current solution won't match anchors with other attributes following 'name' (e.g. <a name="foo" id="foo">).

Try:

$regex = '%<a\s+\S*\s*name=["']([^"']+)["']%i';

This will extract the contents of the 'name' attribute into the back reference $1.
The \s* will also allow for line breaks between attributes.
You don't need to finish off with the rest of the 'a' tag as the negated character class [^"']+ will be lazy.

pelms
Hi pelms,Thank you for the response.I gave your regex a try. Had to escape the single quotes.$regex = '%<a\s+\S*\s*name=["\']([^"\']+)["\']%i';I find I am losing the first character of each name string.Any thoughts?JG
jerrygarciuh
Feh, sorry for the munged formatting.<pre>$regex = '%<a\s+\S*\s*name=["\']([^"\']+)["\']%i';</pre>
jerrygarciuh