I have the following regex: a?\W*?b and I have a string ,.! ,b
When searching for a match I get ,.! ,b, but not just b as I expect. Why is that? How to modify the regex to get what I need?
Thank you for your help.


For example, the following might work: (a\W*)?b

To know better what might solve your problem, you should include more examples.

Eamon Nerbonne
Actually the original regex is 'a?\W*?b?\W*?c?' and I want that any match doesn't contain non-alphabet (\W) symbols neither from the beginig nor at the end.
Could you clarify this (how about some examples?) and put it into your original question instead of somewhere down in the comments where it's harder to find?
Tim Pietzcker

Your regexp matches the entire string like this:

  1. a, zero or one repetitions ("" in this case)
  2. Any character that is not alphanumeric, any number of repetitions, as few as possible (",.! ," in this case)
  3. b

In your case the regexp matches the entire string, and will therefor not find just the b (it doesn't find several matches of the same part).

If you search in a string like ',.! ,db' it will find the b.

Tor Livar
I can tell you're using Expresso, I have exactly the same description on my screen ;)
Thomas Levesque
Yepp - thought it was good descriptions, so I didn't bother rewriting :-)
Tor Livar
+1  A: 

The a? says "i want either zero or one instance of a" - this is satisfied as there is zero instances, and followed by

\W* says "i want zero or more non word characters", which is satisfied by the punctuation and space characters, and finally

b says "match a letter b", which it does. So your whole string satisfies the regex.

It helps if you give more examples of possible inputs before anyone sugests a possible solution.

Your explanation is ignoring the lazy quantifier `*?`... why doesn't it work here ?
Thomas Levesque
\W*? is lazy. So it should include as few as possible symbols. Here the fewest amount of symbols is 0.
@Thomas - there is already a good answer from Tim.
+4  A: 

A lazy quantifier doesn't help here for what you want. Let's see what's happening.

The regex engine starts at the beginning of the string. First tries to match a. It can't, but it's no problem since the a is optional.

Then, there is a lazy \W*? so the regex engine skips it but remembers the current position.

It then tries to match b. It can't, so it backtracks and successfully matches the , with \W*?. It then goes on to try and match b (because of the lazy quantifier). It still can't and backtracks again. This repeats a few times until finally the regex engine has arrived at the b. Now the match is complete - the regex engine declares success.

So the regex works as specified - just not as intended. Now the question is: What exactly do you want the regex to do?

For example, if what you really want is:

Match b alone, unless it's preceded by a and some non-word characters, in which case match everything from a to b, then use

Tim Pietzcker
What to do if I have the following regex 'a?\W*?b?\W*?c?\W*?d?' and I want that any not-empty match starts with a letter and ends with a letter too
A regex that does what you just wrote could be `^([a-z].*[a-z]$|)$`. Match either something that starts and ends with a letter, or match the empty string. Use `RegexOptions.IgnoreCase` if you want uppercase letters to match, too. And use `\p{L}` instead of `[a-z]` if you also want to allow non-ASCII letters.
Tim Pietzcker
+1, very good explanation... I learned something about regex today :)
Thomas Levesque

A lazy expression is only lazy from the right, i.e. it will be as short as possible by removing characters on the right, but it will not remove characters on the left.

To make the match start later, you need a greedy expression before it that swallows the characters that you don't want to match.

Alternatively, as Tim showed, you can make the match start later by only matching the first character and the following separators if the first character exists.


Your example doesn't show why the a? is part of your regex but to match only b in a string that looks like ,.! ,b you can use lookbehind like this (?=\W*?)b.

This matches b that is preceded by a character that is a "non-word character" zero and unlmited times (as few as possible)

If you only want to match say a and b in a string such as a,.! ,b you'll have to use capturing groups: (a?)\W*?(b) where group one will hold the a if present and group 2 b


It's a mistake to speak of a regex as being greedy or non-greedy. You can use non-greedy quantifiers throughout the regex, but it will still try to start matching at the earliest opportunity, as you discovered. Similarly, a regex that uses only greedy quantifiers isn't guaranteed to return the longest possible match. For example,

Regex.Match("foo bar", @"\w+ (?:b|bar)")

...returns foo b, because alternation settles for the first alternative that works, even if a later one would result in a longer match. (Note that I'm talking about Perl-derived regex flavors like .NET's; some flavors, like awk and egrep, do indeed hold out for the longest possible match. But, since those flavors don't have non-greedy quantifiers, greedy isn't just the default mode, it's the only mode.)

In short, there's no such thing as a greedy or non-greedy regex, only greedy or non-greedy quantifiers.

Alan Moore