The fun bit is right at the start when the author notices that the compiler spots this and optimizes it away.
We didn't get into the deeper question of benchmarking it vs. a three-register swap, because I suspect the latter would be handled entirely by register renaming and end up being faster due to not requiring allocation of an ALU unit. Difficult to benchmark that because in order for it to make a difference, you'd need to surround it with other arithmetic instructions.
A meta question is why this persists. It has the right qualities for a "party trick": slightly esoteric piece of knowledge, not actually that hard to understand when you do know about it, but unintuitive enough that most people don't spontaneously reinvent it.
The other classic use of XOR - cursor overdrawing - has also long since gone away. It used to be possible to easily draw a cursor on a monochrome display by XORing it in, then to move it you simply XOR it again, restoring the original image.
You have some packets of data a, b, c. Add one additional packet z that is computed as z = a ^ b ^ c. Now whenever one of a, b or c gets corrupted or lost, it can be reconstructed by computing the XOR of all the others.
So if b is lost: b = a ^ c ^ z. This works for any packet, but only one. If multiple are lost, this will fail.
There are way better error correction algorithms, but I like the simplicity of this one.
The XOR trick is only cool in its undefined-behavior form:
a^=b^=a^=b;
Which allegedly saves you 0.5 seconds of typing in competitive programming competitions from 20 years ago and is known to work reliably (on MinGW under Windows XP).
Bonus authenticity: use a^=a to zero a register in a single x86 instruction (and makes a real difference for compiler toolchains 30+ years old).
For real now, a very useful application of XOR is its relation to the Nim game [0], which comes in very handy if you need to save your village from an ancient disgruntled Chinese emperor.
The XOR swap trick also features in the compilation/synthesis of quantum algorithms, where the XOR instruction (in the form of a CNOT gate) is fundamental in many architectures, and where native swapping need not be available.
One extension that I ran into, and which I think forms a nice problem is the following:
Just like the XOR swap trick can be used to swap to variables (and let's just say that they're bools), it can be extended to implement any permutation of the variables: suppose that the permutation is written as a composition of n transpositions (i.e., swaps of pairs), and that is the minimal number of transpositions that let's you do that. Each transposition can be implemented by 3 XORs, by the XOR swap trick for pairs, and so the full permutation can be implemented by 3n XORs. Now here's the question: Is it possible to come up with a way of doing it with less than 3n, or can we find a permutation that has a shortcut through XOR-land (not allowing any other kinds of instructions)? In other words, is XOR-swapping XOR-optimal?
I'm not going to spoil it, but only last year a paper was published in the quantum information literature that contains an answer [0]. I ended up making a little game where you get to play around with XOR-optimizing not only permutations, but general linear reversible circuits. [1]
My assumption has been that the XOR trick was closer to a thought experiment or hypothetical. That is to say, no one would ever use it in practice, but by asking an interviewee to walk through it shows they have a low-level understanding of how binary numbers work, bit-masking.
For me it falls in the obfuscated-C quadrant for code. Performance implications aside, it's just not the kind of "self-documenting" code I like lying around in my sources. (And I'll take clarity of purpose over performance every day.)
Xor swap trick has perfect profile for underhanded C contests. It generally works until a specific condition triggers its failure. The condition is "the arguments are aliases", so for example XOR_SWAP(a[i], a[j]) when i=j.
You can use this same property of xor to make a double-linked list using just one pointer per item, which is xor of the previous and next item addresses!
I'm not sure I'd call it a "trick", but since A ^ 0 = A, and B ^ B = 0, then ((A ^ B) ^ B) = A. i.e. XOR-ing any number by the same number twice gets you back the original number.
This used to be used back in the day for cheap and nasty computer graphics, since it means that if you draw to the screen by XOR-ing with the pixels already on the screen then you can undo it, restoring the background, by doing it a second time. The "nasty" part is that XORing with what's already on the screen isn't going to look great, but for something like a rotating wire-frame figure it might be OK.
On the Amiga, accessing the pulldown menus caused the display to be temporarily frozen, so that the menu panel could be rendered off screen and then swapped in using the blitter and the XOR trick. It wasn’t as fast as a straight copy, but it saved on precious memory, and it rendered very cleanly.
When you released to the menu button on the mouse, it did a similar swap to restore the screen contents. In version 2.0 I optimized it to do a simple copy of the offscreen rectangle back onto the screen because before it was wasting time in order to preserve the menu pixels that we’re going to be thrown away immediately.
I'll throw my hat in the ring of other "xor" tricks.
So we all know addition swap. One generalization that comes to mind is doing some other in-place transform on the two input variables. Lets keep it simple and suppose that its a linear transform. Thus the problem is to apply some matrix [[a,b],[c,d]] to two input variables [x,y] using entirely in-place operations.
We can do this by realizing that our basic operands can be expressed as matrices.
x += ky is the same as the matrix
[1, k]
[0, 1]
likewise y += kx is equiv to the lower triangular matrix
[1, 0]
[k, 1]
and lastly, the = operator is equiv to a matrix with an element on the diag.
x = k
[k ,0]
[0, 1]
y *= k
[1, 0]
[0, k]
From this point on it becomes a challenge of if you can construct any desired matrix into some combination of these available ones (spoiler, yes you can).
The next generalization one could contemplates is doing operations in place on more than 2 variables. Well, if one has already solved arbitrary 2x2 matrix operations, then that can be rigged to implement larger matrices one submatrix at a time.
The final generalization that comes to mind is what can we do with non-arithmetic operators? We've already seen an example of this with using xor-swap rather than addition-swap. But is there anything out there vaguely like xor-2x2-matrix-multiply?
I legit don't know. I have some thought, but I won't meander out loud if its not going to lead anywhere.
> given a list where every value appears exactly twice except one, XOR all the values together and the duplicates cancel out, leaving the unique element
For some reason this reminds me of the Fourier transform. I wonder if it can be performed with XOR tricks and no complicated arithmetic?
I would remark that modern CPUs don't use physical registers, so swapping should be just a register rename op, and this kind of bithacking only applies to old machines.
The article makes the same point as well at the end:
It is the kind of technique which might have been occasionally useful in the 1980s, but now is only useful for cute interview questions and as a curiosity.
This brings back memories of Chunky-to-Planar conversion on the Amiga, where the Xor trick was combined with bitshifts and masking to rearrange the bits in 8 32-bit words in reasonable time.
xor'ing the elements of lists together also allows a test of "unordered equality" because 1^2^3^4^5 == (xor of any permutation of [1,2,3,4,5]).
Yes there are false positives, and the false negative of all-zero/all-equal, but the test can be useful in a "bloom filter" type case.
Have used it in dynamic firewalling rules ... one can do something pretty close to a JA3/JA4 TLS fingerprint in eBPF with that (to match the cipher lists).
Such tricks were maybe useful 40 years ago while writing assembly code manually or while using a dumb compiler with no optimizations. But nowadays such tricks are near to useless. All useful ones (like optimizing division by 2 via bitshift ) are already implemented as compiler optimizations. Others shouldn't be used in order to avoid making optimizer's job harder.
Another situation to avoid the XOR trick, even when registers are tight, is when swapping pointers in a garbage-collected language, since the intermediate bit patterns are invalid pointers: if a GC mark phase occurs at that moment, you might lose some objects, or spuriously mark others as live.
115 comments
We didn't get into the deeper question of benchmarking it vs. a three-register swap, because I suspect the latter would be handled entirely by register renaming and end up being faster due to not requiring allocation of an ALU unit. Difficult to benchmark that because in order for it to make a difference, you'd need to surround it with other arithmetic instructions.
A meta question is why this persists. It has the right qualities for a "party trick": slightly esoteric piece of knowledge, not actually that hard to understand when you do know about it, but unintuitive enough that most people don't spontaneously reinvent it.
See also: https://en.wikipedia.org/wiki/Fast_inverse_square_root , which requires a bit more maths.
The other classic use of XOR - cursor overdrawing - has also long since gone away. It used to be possible to easily draw a cursor on a monochrome display by XORing it in, then to move it you simply XOR it again, restoring the original image.
> Are there other XOR tricks?
Yes, error correction.
You have some packets of data a, b, c. Add one additional packet z that is computed as z = a ^ b ^ c. Now whenever one of a, b or c gets corrupted or lost, it can be reconstructed by computing the XOR of all the others.
So if b is lost: b = a ^ c ^ z. This works for any packet, but only one. If multiple are lost, this will fail.
There are way better error correction algorithms, but I like the simplicity of this one.
a^=b^=a^=b;
Which allegedly saves you 0.5 seconds of typing in competitive programming competitions from 20 years ago and is known to work reliably (on MinGW under Windows XP).
Bonus authenticity: use
a^=ato zero a register in a single x86 instruction (and makes a real difference for compiler toolchains 30+ years old).For real now, a very useful application of XOR is its relation to the Nim game [0], which comes in very handy if you need to save your village from an ancient disgruntled Chinese emperor.
[0] https://en.wikipedia.org/wiki/Nim
One extension that I ran into, and which I think forms a nice problem is the following:
Just like the XOR swap trick can be used to swap to variables (and let's just say that they're bools), it can be extended to implement any permutation of the variables: suppose that the permutation is written as a composition of n transpositions (i.e., swaps of pairs), and that is the minimal number of transpositions that let's you do that. Each transposition can be implemented by 3 XORs, by the XOR swap trick for pairs, and so the full permutation can be implemented by 3n XORs. Now here's the question: Is it possible to come up with a way of doing it with less than 3n, or can we find a permutation that has a shortcut through XOR-land (not allowing any other kinds of instructions)? In other words, is XOR-swapping XOR-optimal?
I'm not going to spoil it, but only last year a paper was published in the quantum information literature that contains an answer [0]. I ended up making a little game where you get to play around with XOR-optimizing not only permutations, but general linear reversible circuits. [1]
[0] https://link.springer.com/article/10.1007/s11128-025-04831-5
[1] https://swapple.fuglede.dk
For me it falls in the obfuscated-C quadrant for code. Performance implications aside, it's just not the kind of "self-documenting" code I like lying around in my sources. (And I'll take clarity of purpose over performance every day.)
> Are there other XOR tricks?
I'm not sure I'd call it a "trick", but since A ^ 0 = A, and B ^ B = 0, then ((A ^ B) ^ B) = A. i.e. XOR-ing any number by the same number twice gets you back the original number.
This used to be used back in the day for cheap and nasty computer graphics, since it means that if you draw to the screen by XOR-ing with the pixels already on the screen then you can undo it, restoring the background, by doing it a second time. The "nasty" part is that XORing with what's already on the screen isn't going to look great, but for something like a rotating wire-frame figure it might be OK.
When you released to the menu button on the mouse, it did a similar swap to restore the screen contents. In version 2.0 I optimized it to do a simple copy of the offscreen rectangle back onto the screen because before it was wasting time in order to preserve the menu pixels that we’re going to be thrown away immediately.
So we all know addition swap. One generalization that comes to mind is doing some other in-place transform on the two input variables. Lets keep it simple and suppose that its a linear transform. Thus the problem is to apply some matrix [[a,b],[c,d]] to two input variables [x,y] using entirely in-place operations.
We can do this by realizing that our basic operands can be expressed as matrices. x += ky is the same as the matrix [1, k] [0, 1]
likewise y += kx is equiv to the lower triangular matrix [1, 0] [k, 1]
and lastly, the = operator is equiv to a matrix with an element on the diag. x = k [k ,0] [0, 1]
y *= k [1, 0] [0, k]
From this point on it becomes a challenge of if you can construct any desired matrix into some combination of these available ones (spoiler, yes you can).
The next generalization one could contemplates is doing operations in place on more than 2 variables. Well, if one has already solved arbitrary 2x2 matrix operations, then that can be rigged to implement larger matrices one submatrix at a time.
The final generalization that comes to mind is what can we do with non-arithmetic operators? We've already seen an example of this with using xor-swap rather than addition-swap. But is there anything out there vaguely like xor-2x2-matrix-multiply?
I legit don't know. I have some thought, but I won't meander out loud if its not going to lead anywhere.
> given a list where every value appears exactly twice except one, XOR all the values together and the duplicates cancel out, leaving the unique element
For some reason this reminds me of the Fourier transform. I wonder if it can be performed with XOR tricks and no complicated arithmetic?
The article makes the same point as well at the end:
It is the kind of technique which might have been occasionally useful in the 1980s, but now is only useful for cute interview questions and as a curiosity.
Yes there are false positives, and the false negative of all-zero/all-equal, but the test can be useful in a "bloom filter" type case.
Have used it in dynamic firewalling rules ... one can do something pretty close to a JA3/JA4 TLS fingerprint in eBPF with that (to match the cipher lists).
https://lemire.me/blog/2022/01/21/swar-explained-parsing-eig...