tags:

views:

134

answers:

1

I am having trouble building a regular expression with the set of strings over {a, b, c} that is an odd length with exactly one a. Here is my best attempt so far:

(bb|bc|cb|cc)*a(bb|bc|cb|cc)*

This does good for even b and c on either side of the a, but does not account for a odd b and c combination on either side of the a.

Any hints?

+3  A: 

Your string will be a prefix followed by a followed by a suffix.

Both prefix and suffix can be zero length. If not, they have to be either both even or both uneven. This means you have two main cases.

EVENPREFIX a EVENSUFFIX | UNEVENPREFIX a UNEVENSUFFIX

Try this (incomplete and wrong):

([bc][bc])*a([bc][bc])*|([bc][bc][bc])*a([bc][bc][bc])*

There is still one uneven case missing: a single [bc]:

(([bc][bc])*a([bc][bc])*)|([bc]([bc][bc])*a[bc]([bc][bc])*)

According to http://www.fileformat.info/tool/regex.htm, this matches

  • a
  • cac
  • ccabb

I expect it matches the rest too...

The left side guarantees even (or empty) sequences of b or c. The right side is either a single b or c followed by a multiple of two (so that it stays uneven).

Kobi came up with this refinement of the above:

([bc][bc])*(a|[bc]a[bc])([bc][bc])*

How does this work?

The first group is guaranteed to be even. The second group is guaranteed to be uneven with a single a inside. The third group is guaranteed to be be even. Thus, the whole is guaranteed to be uneven.

Daren Thomas
You can simplify that to `([bc][bc])*(a|[bc]a[bc])([bc][bc])*` - apply alternation only where you need it.
Kobi
The square brackets will not work because that means match exactly once. The string `a` by itself should be a valid string since it is odd, your regex does not account for a by itself.
typoknig
@Kobi Right. Thank you for pointing that out. This is a different train of thought, though...
Daren Thomas
@typoknig - wrong, because it's in a group with an asterisk. Also, don't forget `^...$`.
Kobi
@Kobi, yours is dead on. Now I just need to try and understand it how why :)
typoknig
@typoknig Of course it does: zero occurrences of `([bc][bc])` followed by `a` followed by zero occurrences of `([bc][bc])` *is* `a` by itself.
Daren Thomas
Oh, actually, the first regex is wrong, it doesn't match `ca`, or `ccccca`, etc.
Kobi
I couldn't get @Daren's original solutions (either of the two) to match a single `a`.
typoknig
`ca` is invalid (even). so is `ccccca`. Aren't those not part of the language?
Daren Thomas
@typoknig - this might be a case of grouping and how tightly the | operator binds...
Daren Thomas
Oh, right, of course. My bad. Didn't think it through...
Kobi
After looking at it for a few minutes I get it now, the first and last group of `[bc]` catches every `b` or `c` occurring in an even amount, while the middle group takes care of the single `a` and any odd `b` and `c` that may occur on either side of the `a`. Thanks a lot for your help guys!
typoknig