ansaurus

Question

Regex Question: Matching this pattern with hard or soft quotes

Answer 1

+1 A:

Use [] to match character sets:

$p = "%<a.*\s+name=['\"](.*)['\"]\s*>(?:.*)</a>%im";

James Emerton 2009-05-23 17:21:45

Answer 2

+1 A:

Try this:

/<a(?:\s+(?!name)[^"'>]+(?:"[^"]*"|'[^']*')?)*\s+name=("[^"]*"|'[^']*')\s*>/im

Here you just have to strip the surrounding quotes:

substr($match[1], 1, -1)

But using a real parser like DOMDocument would be certainly better that this regular expression approach.

Gumbo 2009-05-23 17:22:47

Excellent! Works like a charm! Thank you very much!

jerrygarciuh 2009-05-23 17:32:16

Better use php's built-in DOMDocument + SimpleXML or DOMXPath (it depends...)

Jet 2009-05-23 20:51:23

Answer 3

+1 A:

James' comment is actually a very popular, but wrong regex used for string matching. It's wrong because it doesn't allow for escaping of the string delimiter. Given that the string delimiter is ' or " the following regex works

$regex = '([\'"])(.*?)(.{0,2})(?<![^\\\]\\\)(\1)';

\1 is the starting delimeter, \2 is the contents (minus 2 characters) and \3 is the last 2 characters and the ending delimiter. This regex allows for escaping of delimiters as long as the escape character is \ and the escape character hasn't been escaped. IE.,

'Valid'
'Valid \' String'
'Invalid ' String'
'Invalid \\' String'

Shawn Biddle 2009-05-23 19:12:56

Answer 4

+1 A:

Your current solution won't match anchors with other attributes following 'name' (e.g. <a name="foo" id="foo">).

Try:

$regex = '%<a\s+\S*\s*name=["']([^"']+)["']%i';

This will extract the contents of the 'name' attribute into the back reference $1.
The \s* will also allow for line breaks between attributes.
You don't need to finish off with the rest of the 'a' tag as the negated character class [^"']+ will be lazy.

pelms 2009-05-24 02:38:21

Hi pelms,Thank you for the response.I gave your regex a try. Had to escape the single quotes.$regex = '%<a\s+\S*\s*name=["\']([^"\']+)["\']%i';I find I am losing the first character of each name string.Any thoughts?JG

jerrygarciuh 2009-05-31 17:35:56

Feh, sorry for the munged formatting.<pre>$regex = '%<a\s+\S*\s*name=["\']([^"\']+)["\']%i';</pre>

jerrygarciuh 2009-05-31 17:37:46

ansaurus

tags:

views:

answers:

Regex Question: Matching this pattern with hard or soft quotes

related questions