If I had a div in HTML that had class="blah user_foo", whats the Match() regex to get the 'foo' bit?
If the class
attribute is the only attribute of the wanted div
elements and the class
value always has the mentioned structure (fixed blah
and then user_
foobar
), you could use this regular expression:
<div\s+class\s*=\s*"blah user_([^"\s]+)
Otherwise try this regular expression:
<div\s+(?:[^>"']+|"[^"]*"|'[^']*')*\bclass\s*=\s*"blah user_([^"\s]+)
The expression (?:[^>"']+|"[^"]*"|'[^']*')*
also takes into account that a plain >
is allowed as attribute value.
Edit Optimized the regular expression with a look-ahead assertion to reduce backtracking:
<div\s+(?:(?:(?!class)[^>"']+(?:"[^"]*"|'[^']*')?)\s+)*class\s*=\s*"blah user_([^"\s]+)
/<div\s(?:[^>'"]*|".*?"|'.*?')*?\bclass\s*=\s*"blah user_(.*?)"/i
The above handles whitespace; and other things before the class specification.
It doesn't handle the case where class='single-quoted-something'
; you could do that with a backreference. It also doesn't handle malformed HTML.
I'm not sure which language uses Match(), but it will probably look something like this:
<div[^>]+class="blah user_([^"]+)"
Depending on the language, 'foo' may be stored in \1, or $1 or something else entirely.