views:

553

answers:

5

I'm working on some homework for my compiler class and I have the following problem:

Write a regular expression for all strings of a's and b's that contain an odd number of a's or an odd number of b's (or both).

After a lot of whiteboard work I came up with the following solution:

(aa|bb)* (ab|ba|a|b) ((aa|bb)* (ab|ba) (aa|bb)* (ab|ba) (aa|bb)*)*

However, Is this is the most simplified I can get it? I've considered constructing the DFA trying to minimize the number of states there to see if it would help me simplify but I figured I would ask the regex gurus on SO first.

+2  A: 

I'm afraid that I don't believe your regex as written is correct. Consider the string:

aba

We have a couple choices for matches, but the fact that it's odd-length means we must match a lone a at the front, so:

(a)(ba)

But, sadly, it's impossible for your second main grouping there to match (ba).

When dealing with a constraint like this, I found it easier to start from the core constraint and go from there. In this case, your constraint is "odd," so start with

a(aa)*

to force an odd number of a's and go from there. :)

Greg D
@Greg D, that is true. Let me think on it for a second.
Simucal
+4  A: 

This should work:

b* a b* (a b* a b*)* |  a* b a* (b a* b a*)*
sepp2k
I was writing something similar :) To elaborate, this regex figures out all strings with an odd number of a's (first section), and OR's those strings with any strings containing an odd number of b's. There's a slight error here though, as first term needs b* at the end, and second option needs a* at the end. Otherwise, abbba will not be accepted.
Walt W
@sepp2k, this is working in all my test cases. Can you describe your thought process when you were making that? It is far simpler than the path I was going down.
Simucal
Nobody said it can't be ambiguous. Walt is correct, it isn't finished, but all the important bits are there. :)
Greg D
@Walt W: You're absolutely right. Fixed. @Simucal: Walt explained it pretty well.
sepp2k
@sepp2k, yea, we submitted out comments at the same time.
Simucal
@sepp2k: Err, I was a little wrong, you need the b* / a* at the end to be inside the parenthesis, or else you can't have b's between two groups of two a's (like aaabbaa). I posted an answer with that fix..
Walt W
I think this might fail on: ABBBABB.
Simucal
Simulcal: Refer to my last comment / answer.
Walt W
Ok, now it should be fixed for good.
sepp2k
+5  A: 

Take Greg D's recommendation of starting with a(aa)*, and going from there. Sepp2k almost has it right, but the real consideration is that you don't care about the other letter. What I mean is, when you are looking at the "odd number of a's" constraint, you don't care at all about what b's are in your string. Thus, stick b*'s anywhere you can :)

Sepp2k's answer is almost right, but this one is correct:

b* a b* (a b* a b* )* | a* b a* (b a* b a* )*

To elaborate, this regex figures out all strings with an odd number of a's (first section), and OR's those strings with any strings containing an odd number of b's.

Walt W
@Walt W, I'm running this one through its paces but I think you are correct.
Simucal
A: 

I think you need to approach the problem differently.

You are trying to match anything that doesn't have even numbers of both a and b.

It would probably be easier to start with something that matches even numbers of a and b. All you have to do at that point would be to add something on the end that matches the smallest string that you actually want to match.

Brad Gilbert
A: 

Greg D's answer is simple, but using it we can generate the same string through multiple paths. For example, there are two ways to generate strings containing both odd number of a's and b's. Can this be simplified further?

Ashwin