tags:

views:

246

answers:

4

I have a somewhat math-oriented problem. I have a bunch of bitfields and would like to calculate what subset of them to xor together to achieve a certain other bitfield, or if there isn't a way to do it discover that no such subset exists.

I'd like to do this using a free library, rather than original code, and I'd strongly prefer something with Python bindings (using Python's built-in math libraries would be acceptable as well, but I want to port this to multiple languages eventually). Also it would be good to not take the memory hit of having to expand each bit to its own byte.

Some further clarification: I only need a single solution. My matrices are the opposite of sparse. I'm very interested in keeping the runtime to an absolute minimum, so using algorithmically fancy methods for inverting matrices is strongly preferred. Also, it's very important that the specific given bitfield be the one outputted, so a technique which just finds a subset which xor to 0 doesn't quite cut it.

And I'm generally aware of gaussian elimination. I'm trying to avoid doing this from scratch!

cross-posted to mathoverflow, because it isn't clear what the right place for this question is - http://mathoverflow.net/questions/41036/how-to-find-which-subset-of-bitfields-xor-to-another-bitfield

+2  A: 

Mathematically speaking, XOR of two bits can be treated as addition in F_2 field.

You want to solve a set of equations in a F_2 field. For four bitfiels with bits (a_0, a_1, ... a_n), (b_0, b_1, ..., b_n), (c_0, c_1, ..., c_n), (r_0, r_1, ..., r_n), you get equations:

x * a_0 + y * b_0 + z * c_0 = r_0
x * a_1 + y * b_1 + z * c_1 = r_1
...
x * a_n + y * b_n + z * c_n = r_n

(where you look for x, y, z).

You could program this as a simple integer linear problem with glpk, probably lp_solve (but I don't remember if it will fit). These might work very slowly though, as they are trying to solve much more general problem.

After googling for a while, it seems that this page might be a good start looking for code. From descriptions it seems that Dixon and LinBox could be a good fit.

Anyway, I think asking at mathoverflow might give you more precise answers. If you do, please link your question here.

Update: Sagemath uses M4RI for solving this problem. This makes it (for me) a very good recommendation.

liori
m4ri looks promising, but argh, general purpose libraries shouldn't be GPL!
Bram Cohen
+2  A: 

For small instances that easily fit in memory, this is just solving a linear system over F_2, so try mod-2 Gaussian elimination. For very large sparse instances, like those that occur in factoring (sieve) algorithms, look up the Wiedemann algorithm.

Mic Grigni
A: 

It's possible to have multiple subsets xor to the same value; do you care about finding all subsets?

A perhaps heavy-handed approach would be to filter the powerset of bitfields. In Haskell:

import Data.Bits

xorsTo :: Int -> [Int] -> [[Int]]
xorsTo target fields = filter xorsToTarget (powerset fields)
  where xorsToTarget f = (foldl xor 0 f) == target

powerset [] = [[]]
powerset (x:xs) = powerset xs ++ map (x:) (powerset xs)

Not sure if there is a way to do this without generating the powerset. (In the worst case, it is possible for the solution to actually be the entire powerset).

Rob Dickerson
That approach yields a solution in exponential time. I want a solution in polynomial time.
Bram Cohen
A: 

expanding on liori's answer above we have a linear system of equations (in modulo 2):

a0, b0, c0 ...| r0
a1, b1, c1 ...| r1
...           |
an, bn, cn ...| rn

Gaussian elimination can be used to solve the system. In modulo 2, the add row operation becomes an XOR operation. It is much simpler computationally to do this than to use a generic linear systems solver.

So, if a0 is zero we swap up a row that has a 1 in the a position. Then perform an XOR (using row 0) on any other row whos "a" bit is a 1. Then repeat using row 1 and column b, then row 2 col c, etc.

If you get a row of zeroes with a non-zero in the r column then the subset DNE.

Denley Bihari
It's very important that I get a particular bitfield out, not all zeros (modified my question to clarify this)
Bram Cohen
this does exactly that. The r0-rn is the particular bitfield (so only all zeroes if r0-rn is all zeroes), with (a,b,c...) as the set of fields (possible elements of the subset). After gaussian elimination the right-most column becomes an ordered set of inclusions in the subset (1 if included, 0 if not)
Denley Bihari
Also, I don't think that this would take you too long to code from scratch (though I don't know what python is like); it's quite an iterative algorithm. It gets the answer in O(n^3). Inverting matrices may be better but probably only if they are square.
Denley Bihari
Oh, I see what you're saying now. Yes, that works, and is probably all I need, although it looks like frustratingly enough I might have to implement from scratch.
Bram Cohen