Assuming that each string represents a binary number in big-endian form (bits being numbered from zero, which refers to the last character of the string), determining if all the odd bits (i.e., bit#1, bit#3, bit#5, etc.) in the string is done by testing if the string matches this regular expression:
^[01]?(?:1[01])*$
To understand this, first let's simplify by assuming that we already know that all characters are either 0
or 1
. Given that (and ignoring non-capturing trickiness), we could have written (in expanded form):
^ .? (1.)* $
A BB CCCCC D <- the key
That's an anchored match of the whole string (A
and D
) which is the empty string, or any digit (B
) or any number of a 1
followed by anything (C
, which puts the 1
in the “odd” position) or a digit before an even number of digits with 1
in the “odd” position (B
followed by C
). I've just converted that basic form to the more efficient and exact representation before it by restricting the alphabet of symbols ([01]
for .
) and using a non-capturing parenthesis ((?:…)
instead of (…)
).
If you're considering the first bit to be bit #1, you need this RE instead:
^1?(?:[01]1)*$
For little-endian bit strings, you need to "reverse the RE" (or use string reverse on the string to be matched and use one of the other matchers). The "reversed RE" for the little-endian bit#0-first form is:
^(?:[01]1)*[01]?$
For little-endian bit#1-first:
^(?:1[01])*1?$
Remember, with all of these regular expressions it is easiest to write them in Tcl by enclosing them in {
curly}
braces.
Demonstrating:
foreach s {
110010001010101111110110
11010101010101111110110
11101101010101100011011
11111100110101010111101
1110111011111010111010
} {
set matches [regexp {^[01]?(?:1[01])*$} $s]
puts "$s [lindex {{doesn't match} matches} $matches]"
}
Produces this output:
110010001010101111110110 doesn't match
11010101010101111110110 doesn't match
11101101010101100011011 doesn't match
11111100110101010111101 doesn't match
1110111011111010111010 matches