tags:

views:

369

answers:

4

How would you write a regular expression to define all strings of 0's and 1's that, as a binary number, represent an integer that is multiple of 3.

Some valid binary numbers would be:

11
110
1001
1100
1111
A: 

I don't think you would. I can't believe in any language using a regular expression could ever be the best way to do this.

Dave Webb
i know its not the best way. I know it can be done but I just cant figure out how. It involves drawing the automata and eliminating middle states.
Jaelebi
@Dave Webb, you can definitely do this. Actually, this is a pretty common sort of exercise in a CS Theory course, which is why I'm hesitant to answer this question.
BobbyShaftoe
do you know how this can be done? any hints?
Jaelebi
@Dave Webb The answer is (1(01*0)*1)*0*
Jaelebi
@unknowh yahoo, not *quite* right, that won't work for 17 * 3 = 51 (110011). You need to allow for repetitions at more levels.
paxdiablo
+1  A: 

http://www.cs.princeton.edu/introcs/73fsa/ 'nuff said (First Google hit to the proper question).

Vinko Vrsalovic
+5  A: 

Using the DFA in http://www.cs.princeton.edu/introcs/73fsa/ we can make a regular expression the following way, where A, B, C represent the states of the DFA.

A = 1B + 0A
B = 1A + 0C
C = 1C + 0B

C = 1*0B // Eliminate recursion

B = 1A + 0(1*0B)
B = 01*0B + 1A
B = (01*0)*1A // Eliminate recursion

A = 1(01*0)*1A + 0A
A = (1(01*0)*1 + 0)A
A = (1(01*0)*1 + 0)* // Eliminate recursion

Resulting in a PCRE regex like:

/^(1(01*0)*1|0)+$/

Perl test/example:

use strict;

for(qw(
11
110
1001
1100
1111
0
1
10
111
)){
    print "$_ (", eval "0b$_", ") ";
    print /^(1(01*0)*1|0)+$/? "matched": "didnt match";
    print "\n";
}

Outputs:

11 (3) matched
110 (6) matched
1001 (9) matched
1100 (12) matched
1111 (15) matched
0 (0) matched
1 (1) didnt match
10 (2) didnt match
111 (7) didnt match
Qtax
+1. Now this is great. I had no idea you could create a regular expression that easy from a DFA.
Lieven
+3  A: 

This has been asked a couple of times so I'm going to try for the definitive answer.

When you divide a number by three, there are only three possible remainders (0, 1 and 2). What you're aiming at is to ensure the remainder is 0, hence a multiple of three.

This can be done by an automata with the three states:

  • ST0, multiple of 3 (0, 3, 6, 9, ....).
  • ST1, multiple of 3 plus 1 (1, 4, 7, 10, ...).
  • ST2, multiple of 3 plus 2 (2, 5, 8, 11, ...).

Now think of any non-negative number (that's our domain) and multiply it by two (tack a binary zero on to it). The transitions for that are:

ST0 -> ST0 (3n * 2 = 3 * 2n, still a multiple of three).
ST1 -> ST2 ((3n+1) * 2 = 3*2n + 2, a multiple plus 2).
ST2 -> ST1 ((3n+2) * 2 = 3*2n + 4 = 3*(2n+1) + 1, a multiple plus 1).

Now think of any non-negative number and multiply it by two then add one (tack a binary one on to it). The transitions for that are:

ST0 -> ST1 (3n * 2 + 1 = 3*2n + 1, a multiple plus 1).
ST1 -> ST0 ((3n+1) * 2 + 1 = 3*2n + 2 + 1 = 3*(2n+1), a multiple).
ST2 -> ST2 ((3n+2) * 2 + 1 = 3*2n + 4 + 1 = 3*(2n+1) + 2, a multiple plus 2).

This idea is that, at the end, you need to finish up in state ST0. However, given that there can be an arbitrary number of sub-expressions (and sub-sub-expressions), it does not lend itslef easily to reduction to a regular expression.

What you have to do is allow for any of the transition sequences that can get from ST0 to ST0 then just repeat them:

These boil down to the two RE sequences:

ST0 --> ST0                                      :  0+
    [0]
ST0 --> ST1 (--> ST2 (--> ST2)* --> ST1)* --> ST0:  1(01*0)*1
    [1]     ([0]     ([1]    )* [0]    )* [1]

or the regex:

(0+|1(01*0)*1)+
paxdiablo