views:

2948

answers:

10

Can someone explain to me how XOR swapping of two variables with no temp variable works?

void xorSwap (int *x, int *y)
{
    if (x != y) {
        *x ^= *y;
        *y ^= *x;
        *x ^= *y;
    }
}

I understand WHAT it does, but can someone walk me through the logic of how it works?

+10  A: 

Most people would swap two variables x and y using a temporary variable, like this:

tmp = x
x = y
y = tmp

Here’s a neat programming trick to swap two values without needing a temp:

x = x xor y
y = x xor y
x = x xor y

More details in Swap two variables using XOR

On line 1 we combine x and y (using XOR) to get this “hybrid” and we store it back in x. XOR is a great way to save information, because you can remove it by doing an XOR again.

On line 2. We XOR the hybrid with y, which cancels out all the y information, leaving us only with x. We save this result back into y, so now they have swapped.

On the last line, x still has the hybrid value. We XOR it yet again with y (now with x’s original value) to remove all traces of x out of the hybrid. This leaves us with y, and the swap is complete!


The computer actually has an implicit “temp” variable that stores intermediate results before writing them back to a register. For example, if you add 3 to a register (in machine-language pseudocode):

ADD 3 A // add 3 to register A

The ALU (Arithmetic Logic Unit) is actually what executes the instruction 3+A. It takes the inputs (3,A) and creates a result (3 + A), which the CPU then stores back into A’s original register. So, we used the ALU as temporary scratch space before we had the final answer.

We take the ALU’s implicit temporary data for granted, but it’s always there. In a similar way, the ALU can return the intermediate result of the XOR in the case of x = x xor y, at which point the CPU stores it into x’s original register.

Because we aren’t used to thinking about the poor, neglected ALU, the XOR swap seems magical because it doesn’t have an explicit temporary variable. Some machines have a 1-step exchange XCHG instruction to swap two registers.

VonC
I understand that, I'm asking how it works. How does using an exclusive or on a value allow you to swap it without a temp variable
Simucal
just added the explanation
VonC
Upvoted because this is the clearest and most detailed answer, but want to note that the swap with a temp variable is a lot more readable and by virtue of that carries more value in code
eyelidlessness
I liked the original answer (with explanation), but the bit about the ALU seems misguided. Even on the single-cycle (non-pipelined) processors you allude to, the ability to do "x = (op involving x)" in 1 instruction has more to do with the fact that the register file has input *and* output ports.
Matt J
+58  A: 

You can see how it works by doing the substitution:

x1 = x0 xor y0
y2 = x1 xor y0
x2 = x1 xor y2

Substituting,

x1 = x0 xor y0
y2 = (x0 xor y0) xor y0
x2 = (x0 xor y0) xor ((x0 xor y0) xor y0)

Because xor is fully associative and commutative:

y2 = x0 xor (y0 xor y0)
x2 = (x0 xor x0) xor (y0 xor y0) xor y0

Since x xor x == 0 for any x,

y2 = x0 xor 0
x2 = 0 xor 0 xor y0

And since x xor 0 == x for any x,

y2 = x0
x2 = y0

And the swap is done.

Greg Hewgill
+21  A: 

Here's one that should be slightly easier to grok:

int x = 10, y = 7;

y = x + y; //x = 10, y = 17
x = y - x; //x = 7, y = 17
y = y - x; //x = 7, y = 10

Now, one can understand the XOR trick a little more easily by understanding that ^ can be thought of as + or -. Just as:

x + y - ((x + y) - x) == x

, so:

x ^ y ^ ((x ^ y) ^ x) == x
Matt J
@Matt J, thanks for the subtraction example. It did help me grok it.
Simucal
Might be worth emphasising that you can't use the addition or subtraction methods because of overflows with large numbers.
MarkJ
Is that the case? In the small examples I worked out, things worked out OK regardless (assuming the result of an underflow or overflow is (result % 2^n)). I might code something up to test it out.
Matt J
I think that, assuming the most parsimonious hardware implementation of the ADD and SUB instructions, this works properly even in the presence of overflow or underflow. I've just tested it. Am I missing something?
Matt J
MarkJ
+34  A: 

Other people have explained it, now I want to explain why it was a good idea, but now isn't.

Back in the day when we had simple single cycle or multi-cycle CPUs, it was cheaper to use this trick to avoid costly memory dereferences or spilling registers to the stack. However, we now have CPUs with massive pipelines instead. The P4s ranged from having 20 to 31 (or so) stages in their pipelines, where any dependence between reading and writing to a register could cause the whole thing to stall. The xor swap has some very heavy dependencies between A and B that don't actually matter at all but stall the pipeline in practice. A stalled pipeline is a causes a slow code path, and if this swap's in your inner loop, you're going to be moving very slowly.

In general practice, your compiler can figure out what you really want to do when you do a swap with a temp variable and can compile it to a single XCHG instruction. Using the xor swap makes it much harder for the compiler to guess your intent and therefore much less likely to optimize it correctly. Not to mention code maintenance, etc.

Patrick
@Patrick, Nice explanation of usage.. thanks!
Simucal
Yep - like all memory-saving tricks, this isn't so useful in these days of cheap memory.
Bruce Alderman
By the same token, however, embedded system cpus still benefit quite a lot.
Paul Nathan
@Paul, it'd depend on your tool chain. I'd test it first to be certain that your embedded compiler isn't already performing that optimization.
Patrick
(It's also worth noting that from a size perspective, three XORs is likely larger than one XCHG, depending on the architecture. You may save more space by not using the xor trick.)
Patrick
+4  A: 

@VonC has it right, it's a neat mathematical trick. Imagine 4 bit words and see if this helps.

word1 ^= word2;
word2 ^= word1;
word1 ^= word2;


word1    word2
0101     1111
after 1st xor
1010     1111
after 2nd xor
1010     0101
after 3rd xor
1111     0101
kenny
A: 

Basically there are 3 steps in the XOR approach:

a’ = a XOR b (1)
b’ = a’ XOR b (2)
a” = a’ XOR b’ (3)

To understand why this works first note that:

  1. XOR will produce a 1 only if exactly one of it’s operands is 1, and the other is zero;
  2. XOR is commutative so a XOR b = b XOR a;
  3. XOR is associative so (a XOR b) XOR c = a XOR (b XOR c); and
  4. a XOR a = 0 (this should be obvious from the definition in [1] above)

After Step (1), the binary representation of a will have 1-bits only in the bit positions where a and b have opposing bits. That is either (ak=1, bk=0) or (ak=0, bk=1). Now when we do the substitution in Step (2) we get:

b’ = (a XOR b) XOR b
= a XOR (b XOR b) because XOR is associative
= a XOR 0 because of [4] above
= a due to definition of XOR (see [1] above)

Now we can substitute into Step (3):

a” = (a XOR b) XOR a
= (b XOR a) XOR a because XOR is commutative
= b XOR (a XOR a) because XOR is associative
= b XOR 0 because of [4] above
= b due to definition of XOR (see [1] above)

More detailed information here: Necessary and Sufficient

A: 

The reason WHY it works is because XOR doesn't lose information. You could do the same thing with ordinary addition and subtraction if you could ignore overflow. For example, if the variable pair A,B originally contains the values a,b, you could swap them like this:

 \\ A,B = a,b
A = A+B // (a+b),b
B = A-B // (a+b),a
A = A-B // b, a

BTW there's an old trick for encoding a 2-way linked list in a single "pointer". Suppose you have a list of memory blocks at addresses A, B, and C. The first word in each block is , respectively:

 // first word of each block is sum of addresses of prior and next block
 0 + &B   // first word of block A
&A + &C   // first word of block B
&B + 0    // first word of block C

If you have access to block A, it gives you the address of B. To get to C, you take the "pointer" in B and subtract A, and so on. It works just as well backwards. To run along the list, you need to keep pointers to two consecutive blocks. Of course you would use XOR in place of addition/subtration, so you wouldn't have to worry about overflow.

You could extend this to a "linked web" if you wanted to have some fun.

Mike Dunlavey
The single pointer trick is pretty awesome, didn't know about this! Thanks!
Gab Royer
@Gab: You're welcome, and your English skills are a lot better than my French!
Mike Dunlavey
+1  A: 

As a side note I reinvented this wheel independently several years ago in the form of swapping integers by doing:

a = a + b
b = a - b ( = a + b - b once expanded)
a = a - b ( = a + b - a once expanded).

(This is mentioned above in a difficult to read way),

The exact same reasoning applies to xor swaps: a ^ b ^ b = a and a ^ b ^ a = a. Since xor is commutative, x ^ x = 0 and x ^ 0 = x, this is quite easy to see since

= a ^ b ^ b
= a ^ 0
= a

and

= a ^ b ^ a 
= a ^ a ^ b 
= 0 ^ b 
= b

Hope this helps. This explanation has already been given... but not very clearly imo.

jheriko
+2  A: 

I like to think of it graphically rather than numerically.

Let's say you start with x = 11 and y = 5 In binary (and I'm going to use a hypothetical 4 bit machine), here's x and y

       x: |1|0|1|1|   -> 8 + 2 + 1
       y: |0|1|0|1|   -> 4 + 1

Now to me, XOR is an invert operation and doing it twice is a mirror:

     x^y: |1|1|1|0|
 (x^y)^y: |1|0|1|1|   <- ooh!  Check it out - x came back
 (x^y)^x: |0|1|0|1|   <- ooh!  y came back too!
plinth
I'm surprised this doesn't have more upvotes. Graphically is how I'd think about it too.
RichardOD
A: 

If you have access to block A, it gives you the address of B. To get to C, you take the "pointer" in B and subtract A, and so on. It works just as well backwards. To run along the list, you need to keep pointers to two consecutive blocks. Of course you would use XOR in place of addition/subtration, so you wouldn't have to worry about overflow...

I did not follow the above steps can someone throw some light into it. It is much appreciated.

Sam