tags:

views:

119

answers:

4

I have regex which reads:

@"<img\s*[^>]*>(?:\s*?</img>)?

Can someone please explain this part: (?:\s*?)?

What is that?

+9  A: 

match but don't capture any number of whitespace followed by a close image tag, zero or one times:

(?: = match but don't capture

\s*? = any number of whitespace (not greedy)

</img> = close image tag

)? = zero or one times

:)

Luke Schafer
+1  A: 

(?:\s*?) selects any whitespace, if it exists, after the image tag. The ?: at the beginning tells the regex engine to not capture that group (meaning it won't be returned in the matches array)

brianreavis
A: 

non-capturing group of any number of whitespace characters, followed by a closing img tag

benPearce
A: 

The entire expression will capture any <img> tags that have corresponding </img> tags (but it won't capture the close tags). It doesn't capture the close tags because the (?:) syntax means "match but don't capture".

Some restrictions that are part of this regex:

  1. The \s* in the opening tag is redundant because [^>]* will capture this too
  2. Only whitespace is allowed between the opening and closing tags

Some examples:

  • <img> will not match
  • <img></img> will match, but only capture <img>
  • <img attr="123"></img> will match, but only capture <img attr="123">
  • <imgabc></img> will not match
  • <img> </img> will match, but only capture <img>
  • <img>ab</img> will not match

I highly recommend the Regular Expression Designer available for free at www.radsoftware.com.au for testing regexs

Tatham Oddie
wrong - the ? after the final group causes it not to be greedy, meaning things like <img> will match
Luke Schafer