tags:

views:

40

answers:

4

I need to write a regex that matches strings like "abc", "ab", "ac", "bc", "a", "b", "c". Order is important and it shouldn't match multiple appearances of the same part.

a?b?c? almost does the trick. Except it matches empty strings too. Is there any way to prevent it from matching empty strings or maybe a different way to write a regex for the task.

A: 

You can write down all permutations (which is a pain) or all possibilities of which letter is not left out (ab?c?|a?bc?|a?b?c), which is somewhat less of a pain.

sepp2k
Yeah, I know that I can do that but that's what I want to avoid here.
Alan Mendelevich
+1  A: 

To do this with pure regex you're going to have to expand it into all of its possibilities:

ab?c?|a?bc?|a?b?c

If you have lookaheads you can make sure the string is non-empty. Or you can verify the string has a length of at least one before passing it to the expression, depending on your choice of language.

For example a .NET lookahead might look like this:

^(?=[abc])a?b?c?$

Or you could just test your string's length before matching it:

if (YourString.Length == 1) {
   // matching code goes here, using the expression a?b?c?
}
Welbog
(?=[abc])a?b?c? or even better (?=(a|b|c))a?b?c? (cause my real a,b,c aren't letters but longer constructs) seems to be working perfectly. Thanks!
Alan Mendelevich
+3  A: 
^(?=.)a?b?c?$

This will check if there is at least one character with lookahead and will match your regex.

Thexa4
Thanks. This works perfectly with precondition that it's the complete string and not a substring. If I remove ^ and $ it matches as many times as there are other symbols. Anyway this is very close and I will give you one up, but Weblog has a closer solution. Thanks!
Alan Mendelevich
A: 

It is pointless to try to pack all functionality of all problems you ever have into one single regexp. The best solution is to write the obvious regex, and add a check against zero length. You should only get extra clever with regexps when you absolutely have to - for instance if you have to interact with an unchangeable API that accepts only one regexp and nothing more.

Kilian Foth
The problem is not only zero length but that it basically matches all "nothingness" between each non-matching symbol. So for the string like "zxvnm" there will be 5 matches.
Alan Mendelevich