tags:

views:

192

answers:

3
Q: 

Regex help

If I had a div in HTML that had class="blah user_foo", whats the Match() regex to get the 'foo' bit?

+2  A: 

If the class attribute is the only attribute of the wanted div elements and the class value always has the mentioned structure (fixed blah and then user_foobar), you could use this regular expression:

<div\s+class\s*=\s*"blah user_([^"\s]+)

Otherwise try this regular expression:

<div\s+(?:[^>"']+|"[^"]*"|'[^']*')*\bclass\s*=\s*"blah user_([^"\s]+)

The expression (?:[^>"']+|"[^"]*"|'[^']*')* also takes into account that a plain > is allowed as attribute value.

Edit    Optimized the regular expression with a look-ahead assertion to reduce backtracking:

<div\s+(?:(?:(?!class)[^>"']+(?:"[^"]*"|'[^']*')?)\s+)*class\s*=\s*"blah user_([^"\s]+)
Gumbo
+1  A: 
/<div\s(?:[^>'"]*|".*?"|'.*?')*?\bclass\s*=\s*"blah user_(.*?)"/i

The above handles whitespace; and other things before the class specification.

It doesn't handle the case where class='single-quoted-something'; you could do that with a backreference. It also doesn't handle malformed HTML.

Daniel LeCheminant
+1  A: 

I'm not sure which language uses Match(), but it will probably look something like this:

<div[^>]+class="blah user_([^"]+)"

Depending on the language, 'foo' may be stored in \1, or $1 or something else entirely.

Stephan202
Attribute values may contain plain `>` characters.
Gumbo
I see your point. Your solution does address this issue. I'll vote it up.
Stephan202