tags:

views:

136

answers:

4

I am updating some code that I didn't write and part of it is a regex as follows:

\[url(?:\s*)\]www\.(.*?)\[/url(?:\s*)\]

I understand that .*? does a non-greedy match of everything in the second register.

What does ?:\s* in the first and third registers do?

Update: As requested, language is C# on .NET 3.5

+9  A: 

The syntax (?:) is a way of putting parentheses around a subexpression without separately extracting that part of the string.

The author wanted to match the (.*?) part in the middle, and didn't want the spaces at the beginning or the end from getting in the way. Now you can use \1 or $1 (or whatever the appropriate method is in your particular language) to refer to the domain name, instead of the first chunk of spaces at the beginning of the string

VoteyDisciple
Right... and (?:) is preferable to () whenever you don't need to refer to the captured subexpression elsewhere (such as in a backreference or in the match output): it conveys more of your intention, and (at least potentially) makes processing more efficient.
harpo
So - `(?:\s*)` matches zero or more whitespace characters, without putting it into the backreferences - which is strange because `\s*` does the exact same thing, only doesn't look as confusing ;)
gnarf
@gnarf: `(?:)` is useful with alternation, ie. `(?:foo|bar)` matches either "foo" or "bar" without capture.
Greg Hewgill
never said it wasn't greg - just pointing out that it was a bit pointless on `\s*`
gnarf
Point taken. :)
Greg Hewgill
+4  A: 

?: makes the parentheses non-grouping. In that regex, you'll only pull out one piece of information, $1, which contains the middle (.*?) expression.

Stefan Kendall
+1  A: 

Hello,

You may find this Regular Expressions Cheat Sheet very helpful (hopefully). I spent ages trying to learn Regex with no luck. And once I read this cheat-sheet - I immediately understood what I previously failed to learn.

http://krijnhoetmer.nl/stuff/regex/cheat-sheet/

baeltazor
I would upvote this, but funny enough, it doesn't actually answer the OP's question.
musicfreak
I have 99 problems, but a Regex is no longer one of them.
jscharf
It didn't answer *that* question but serendipitously it answered the next question that I was going to ask so +1 for seeing into the future.
Guy
thank you Guy. I had a massive mental block when I was about to answer your question. I'm glad it helped in the end...
baeltazor
+1  A: 

What does ?:\s* in the first and third registers do?

It's matching zero or more whitespace characters, without capturing them.

The regex author intends to allow trailing whitespace in the square-bracket-tags, matching all DNS labels following the "www." like so:

[url]www.foo.com[/url]     # foo.com
[url  ]www.foo.com[/url  ] # same
[url  ]www.foo.com[/url]   # same
[url]www.foo.com[/url  ]   # same

Note that the regex also matches:

[url]www.[/url]      # empty string!

and fails to match

[url]stackoverflow.com[/url]  # no match, bummer
pilcrow
Thanks for the examples - much appreciated +1
Guy