What I really want is memory order emulation. X86 as strong memory order guarantees, ARM has much weaker guarantees. Which means the multi-threaded queue I'm working on works all the time on development x86 machine even if I forget to put in the correct memory-order schematics, but it might or might not work on ARM (which is what my of my users have). (I am in the habit of running all my stress tests 1000 times before I'm willing to send them out, but that doesn't mean the code is correct, it means it works on x86 and passed my review which might miss something)
I wrote a similar post [1] some 16 years ago. My solution back then was to install Debian for PowerPC on QEMU using qemu-system-ppc.
But Hans's post uses user-mode emulation with qemu-mips, which avoids having to set up a whole big-endian system in QEMU. It is a very interesting approach I was unaware of. I'm pretty sure qemu-mips was available back in 2010, but I'm not sure if the gcc-mips-linux-gnu cross-compiler was readily available back then. I suspect my PPC-based solution might have been the only convenient way to solve this problem at the time.
Thanks for sharing it here. It was nice to go down memory lane and also learn a new way to solve the same problem.
> When programming, it is still important to write code that runs correctly on systems with either byte order
What you should do instead is write all your code so it is little-endian only, as the only relevant big-endian architecture is s390x, and if someone wants to run your code on s390x, they can afford a support contract.
Don't ignore endianness. But making little endian the default is the right thing to do, it is so much more ubiquitous in the modern world.
The vast majority of modern network protocols use little endian byte ordering. Most Linux filesystems use little endian for their on-disk binary representations.
There is absolutely no good reason for networking protocols to be defined to use big endian. It's an antiquated arbitrary idea: just do what makes sense.
What do you mean by “networking protocols,” exactly? Most packet level Internet protocols (TCP, UDP, etc.) are big endian. Ethernet is big endian at the octet level and little endian on the wire at the bit level. Network order is big endian because it has to be something and it’s easier to draw pictures as a matrix of bytes that are transmitted from left to right and top to bottom. There is no right answer to endianness. It’s like which side of the road cars should drive on. You just need to pick one and stick with it. Mostly people bitch about endianness when their processor is the opposite of whatever someone else picked. But processors are all over the map. IBM mainframes are big endian. Motorola 68k is big. HP PA-RISC is big. IBM Power started big and then went bi. MIPS is bi. RISC-V is little. ARM is bi but dominantly little (AArch64). And of course x86 is little. So, take your pick. That said, little endianness is the right answer as is driving on the right side of the road.
These days it's bi, actually :) Although I don't see any CPU designer actually implementing that feature, except maybe MIPS (who have stopped working on their own ISA, and now want all their locked-in customers to switch to RISC-V without worrying about endianness bugs)
Well, sort of. Instruction fetch is always little-endian but data load/store can be flipped into big. But IIRC the standard profiles specify little, so it's pretty much always going to be little. But yea, technically speaking data load/store could be big. Maybe that's important for some embedded environments.
I read your reply as mostly agreeing with me: endianness is arbitrary, using big endian for a novel protocol just because some widely used protocols decided to decades ago is silly.
> it’s easier to draw pictures as a matrix of bytes that are transmitted from left to right and top to bottom.
There are many reasons for big endian... but that is not one of them :)
> But processors are all over the map
That's not true anymore, big endian is dead. Upstream Linux is refusing to support big endian riscv at all, and is making serious noises about ripping out the existing big endian aarch64 support because the companies that ship the hardware that needs it don't work upstream.
> it’s easier to draw pictures as a matrix of bytes that are transmitted from left to right and top to bottom
This argument is pretty silly: visualizations can always be changed. For some time I have been thinking that hexdumps on little-endian systems ought to be written right-to-left: in fact, when I once decided to include such a right-to-left dumper in my own software, it took me very little time for me to get used to, and I immediately started regretting I don't have it available everywhere.
You should actually not use format-swapping operations.
You should actually use format-swapping loads/stores (i.e deserialization/serialization).
This is because your computer can not compute on values of non-native endianness. As such, the value is logically converted back and forth on every operation. Of course, a competent optimizer can elide these conversions, but such actions fundamentally lack machine sympathy.
The better model is viewing the endianness as a serialization format and converting at the boundaries of your compute engine. This ensures you only need to care about endianness when serializing and deserializing wire formats and that you have no accidental mixing of formats in your internals; everything has been parsed to native before any computation occurs.
Essentially, non-native endianness should only exist in memory and preferably only memory filled in by the outside world before being parsed.
There's still at least one relevant big-endian-only ARM chip out there, the TI Hercules. While in the past five or ten years we've gone from having very few options for lockstep microcontrollers (with the Hercules being a very compelling option) to being spoiled for choice, the Hercules is still a good fit for some applications, and is a pretty solid chip.
The linked to blog post in the OP explains this better IMHO [0]:
If the data stream encodes values with byte order B, then the algorithm to decode the value on computer with byte order C should be about B, not about the relationship between B and C.
One cannot just ignore the big/little data interchange problem MacOS[1], Java, TCP/IP, Jpeg etc...
The point (for me) is not that your code runs on a s390, it is that you abstract your personal local implementation details from the data interchange formats. And unfortunately almost all of the processors are little, and many of the popular and unavoidable externalization are big...
To cope with data interchange formats, you need a set of big endian data types, e.g. for each kind of signed or unsigned integer with a size of 16 bits or bigger you must have a big endian variant, e.g. identified with a "_be" suffix.
Most CPUs (including x86-64) have variants of the load and store instructions that reverse the byte order (e.g. MOVBE in x86-64). The remaining CPUs have byte reversal instructions for registers, so a reversed byte order load or store can be simulated by a sequence of 2 instructions.
So the little-endian types and the big-endian data types must be handled identically by a compiler, except that the load and store instructions use different encodings.
The structures used in a data-exchange format must be declared with the correct types and that should take care of everything.
Any decent programming language must provide means for the user to define such data types, when they are not provided by the base language.
The traditional UNIX conversion functions are the wrong way to handle endianness differences. An optimizing compiler must be able to recognize them as special cases in order to be able to optimize them away from the machine code.
A program that is written using only data types with known endianness can be compiled for either little-endian targets or big-endian targets and it will work identically.
All the problems that have ever existed in handling endianness have been caused by programming languages where the endianness of the base data types was left undefined, for fear that recompiling a program for a target of different endianness could result in a slower program.
Having different types seems wrong to me because endianess issues disappears after serialization, so it would make more sense to slap an annotation on the data field so just the serializer knows how to load/store it.
Nah, that's a terrible way to handle endian-ness. Your "big endian" types infect your entire program. And you pay a cost with every computation you do with them.
Just treat the data on disk / on the wire as if it were in some encoded format. Parse on load. Encode back out to the expected format when you save it. Within your program, just use your language's native int formats.
... And the equivalent for little endian data. Modern optimizers will happily turn that into the right instructions - either a noop or bswap - as appropriate depending on the target architecture.
You can do the same thing in Rust, Go, or any other language. No special type definitions or macros necessary.
MacOS "was" big-endian due to 68k and later PPC cpu's (the PPC Mac's could've been little but Apple picked big for convenience and porting).
Their x86 changeover moved the CPU's to little-endian and Aarch64 continues solidifies that tradition.
Same with Java, there's probably a strong influence from SPARC's and with PPC, 68k and SPARC being relevant back in the 90s it wasn't a bold choice.
But all of this is more or less legacy at this point, I have little reason to believe that the types of code I write will ever end up on a s390 or any other big-endian platform unless something truly revolutionizes the computing landscape since x86, aarch64, risc-v and so on run little now.
I'm with you this. I lived through the big endian / little endian hell in the 80/90s. Little endian won. Anyone making a big endian architechture at this point would be shooting themselves in the foot because off all the incompatibilities. Don't make things more complicated.
In fact, I'd be surprised if you made a big endian arch and then ran a browser on it if some large number of websites would fail because they used typedarrays and aren't endian aware.
The solution is not to ask every programmer in the universe to write endian aware code. The solution is to standardize on little endian
Not only the System/390.
Its also IBM i, AIX, and for many protocols the network byte order.
AFAIK the binary data in JPG (1) and Java Class [2] files a re big endian.
And if you write down a hexadecimal number as 0x12345678 you are writing big-endian.
(1) for JPG for embedded TIFF metadata which can have both.
The endianness of file formats and handwriting is irrelevant when it comes to deciding whether your code should support running on big-endian CPUs.
The only question that matters: Do your customers / users want to run it on big-endian hardware? And for 99% of programmers, the answer is no, because their customers have never knowingly been in the same room as a big-endian CPU.
> What you should do instead is write all your code so it is little-endian only, as the only relevant big-endian architecture is s390x, and if someone wants to run your code on s390x, they can afford a support contract.
Or you can just be a nice person and make your code endian-agnostic. ;-)
> What you should do instead is write all your code so it is little-endian only
Of course that’s not what people will do. They’ll write code and not have any idea which parts have a dependency on endianness. It won’t be given a thought during their design or testing and when they need to make it work on a different architecture, it will needlessly be a giant pain in the ass.
As with many comments here: use a build-time assertion that the system is little-endian, and ignore it. Untested code is broken code.
I was at IBM when we gave up on big endian for Power. Too much new code assumed LE, and we switched, despite the insane engineering effort (though TBH, that effort had the side effect of retaining some absolutely first-class engineers a few more years).
There is one reason not mentioned in the article why it is worth testing code on big-endian systems – some bugs are more visible there than on little-endian systems. For example, accessing integer variable through pointer of wrong type (smaller size) often pass silently on little-endian (just ignoring higher bytes), while read/writ bad values on big-endian.
If you're worrying about the endianness of your processor, your code is somehow accessing memory from 'outside' as anything other than a char*, which is already thin ice as far as C and C++ are concerned. You should have a parse__le and/or parse__be function to convert from that byte stream into your native types, that only cares about the _endianness of the data_ (and they can be implemented without caring about your processor endianness as well). Then you don't need to worry about the processor you're running on at all. There's more significant and subtle processor quirks than endianness to worry about if you're trying to write portable code (namely, memory model and alignment requirements).
For most code it doesn't matter. It matters when you are writing files to be read by something else, or when sending data over a network. So make sure the places where those happen are thin shims that are easy to fix if it doesn't work. (that is done write data from everywhere, put a layer in place for this).
> When programming, it is still important to write code that runs correctly on systems with either byte order
I contend it's almost never important and almost nobody writing user software should bother with this. Certainly, people who didn't already know they needed big-endian should not start caring now because they read an article online. There are countless rare machines that your code doesn't run on--what's so special about big endian? The world is little endian now. Big endian chips aren't coming back. You are spending your own time on an effort that will never pay off. If big endian is really needed, IBM will pay you to write the s390x port and they will provide the machine.
This whole endianness issue can be traced to western civilization adopting Arabic numbers. Western languages are written left to right, but Arabic is right to left. Thus, Arabic numbers appear as big-endian when viewed in western languages. Consequently, big-endian appears to be "normal" for us in the modern age. But in Arabic, numbers appear little-endian because everything is right to left. Roman numbers are big-endian, though. Maybe that's why we kept the Arabic ordering even when adopting the system? We could have flipped Arabic numbers around and written them as little-endian, but we didn't.
128 comments
[0]: https://github.com/tokio-rs/loom
But Hans's post uses user-mode emulation with qemu-mips, which avoids having to set up a whole big-endian system in QEMU. It is a very interesting approach I was unaware of. I'm pretty sure qemu-mips was available back in 2010, but I'm not sure if the gcc-mips-linux-gnu cross-compiler was readily available back then. I suspect my PPC-based solution might have been the only convenient way to solve this problem at the time.
Thanks for sharing it here. It was nice to go down memory lane and also learn a new way to solve the same problem.
[1] https://susam.net/big-endian-on-little-endian.html
> When programming, it is still important to write code that runs correctly on systems with either byte order
What you should do instead is write all your code so it is little-endian only, as the only relevant big-endian architecture is s390x, and if someone wants to run your code on s390x, they can afford a support contract.
The vast majority of modern network protocols use little endian byte ordering. Most Linux filesystems use little endian for their on-disk binary representations.
There is absolutely no good reason for networking protocols to be defined to use big endian. It's an antiquated arbitrary idea: just do what makes sense.
Use these functions to avoid ifdef noise: https://man7.org/linux/man-pages/man3/endian.3.html
> RISC-V is little
These days it's bi, actually :) Although I don't see any CPU designer actually implementing that feature, except maybe MIPS (who have stopped working on their own ISA, and now want all their locked-in customers to switch to RISC-V without worrying about endianness bugs)
> Well, sort of. Instruction fetch is always little-endian but data load/store can be flipped into big
ARM works the same way. And SPARC is the opposite, instructions are always big-endian, but data can be switched to little-endian.
> it’s easier to draw pictures as a matrix of bytes that are transmitted from left to right and top to bottom.
There are many reasons for big endian... but that is not one of them :)
> But processors are all over the map
That's not true anymore, big endian is dead. Upstream Linux is refusing to support big endian riscv at all, and is making serious noises about ripping out the existing big endian aarch64 support because the companies that ship the hardware that needs it don't work upstream.
> it’s easier to draw pictures as a matrix of bytes that are transmitted from left to right and top to bottom
This argument is pretty silly: visualizations can always be changed. For some time I have been thinking that hexdumps on little-endian systems ought to be written right-to-left: in fact, when I once decided to include such a right-to-left dumper in my own software, it took me very little time for me to get used to, and I immediately started regretting I don't have it available everywhere.
You should actually use format-swapping loads/stores (i.e deserialization/serialization).
This is because your computer can not compute on values of non-native endianness. As such, the value is logically converted back and forth on every operation. Of course, a competent optimizer can elide these conversions, but such actions fundamentally lack machine sympathy.
The better model is viewing the endianness as a serialization format and converting at the boundaries of your compute engine. This ensures you only need to care about endianness when serializing and deserializing wire formats and that you have no accidental mixing of formats in your internals; everything has been parsed to native before any computation occurs.
Essentially, non-native endianness should only exist in memory and preferably only memory filled in by the outside world before being parsed.
The point (for me) is not that your code runs on a s390, it is that you abstract your personal local implementation details from the data interchange formats. And unfortunately almost all of the processors are little, and many of the popular and unavoidable externalization are big...
[0] https://commandcenter.blogspot.com/2012/04/byte-order-fallac... [1] https://github.com/apple/darwin-xnu/blob/main/EXTERNAL_HEADE...
Most CPUs (including x86-64) have variants of the load and store instructions that reverse the byte order (e.g. MOVBE in x86-64). The remaining CPUs have byte reversal instructions for registers, so a reversed byte order load or store can be simulated by a sequence of 2 instructions.
So the little-endian types and the big-endian data types must be handled identically by a compiler, except that the load and store instructions use different encodings.
The structures used in a data-exchange format must be declared with the correct types and that should take care of everything.
Any decent programming language must provide means for the user to define such data types, when they are not provided by the base language.
The traditional UNIX conversion functions are the wrong way to handle endianness differences. An optimizing compiler must be able to recognize them as special cases in order to be able to optimize them away from the machine code.
A program that is written using only data types with known endianness can be compiled for either little-endian targets or big-endian targets and it will work identically.
All the problems that have ever existed in handling endianness have been caused by programming languages where the endianness of the base data types was left undefined, for fear that recompiling a program for a target of different endianness could result in a slower program.
This fear is obsolete today.
Just treat the data on disk / on the wire as if it were in some encoded format. Parse on load. Encode back out to the expected format when you save it. Within your program, just use your language's native int formats.
For example, in C I use something like this:
... And the equivalent for little endian data. Modern optimizers will happily turn that into the right instructions - either a noop or bswap - as appropriate depending on the target architecture.You can do the same thing in Rust, Go, or any other language. No special type definitions or macros necessary.
https://godbolt.org/z/746EaYx4r
Their x86 changeover moved the CPU's to little-endian and Aarch64 continues solidifies that tradition.
Same with Java, there's probably a strong influence from SPARC's and with PPC, 68k and SPARC being relevant back in the 90s it wasn't a bold choice.
But all of this is more or less legacy at this point, I have little reason to believe that the types of code I write will ever end up on a s390 or any other big-endian platform unless something truly revolutionizes the computing landscape since x86, aarch64, risc-v and so on run little now.
In fact, I'd be surprised if you made a big endian arch and then ran a browser on it if some large number of websites would fail because they used typedarrays and aren't endian aware.
The solution is not to ask every programmer in the universe to write endian aware code. The solution is to standardize on little endian
(1) for JPG for embedded TIFF metadata which can have both.
[2] https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.ht...
The only question that matters: Do your customers / users want to run it on big-endian hardware? And for 99% of programmers, the answer is no, because their customers have never knowingly been in the same room as a big-endian CPU.
> What you should do instead is write all your code so it is little-endian only, as the only relevant big-endian architecture is s390x, and if someone wants to run your code on s390x, they can afford a support contract.
Or you can just be a nice person and make your code endian-agnostic. ;-)
> the only relevant big-endian architecture is s390x
The adjacent POWER architecture is also still relevant - but as you say, they too can afford a support contract.
> What you should do instead is write all your code so it is little-endian only
Of course that’s not what people will do. They’ll write code and not have any idea which parts have a dependency on endianness. It won’t be given a thought during their design or testing and when they need to make it work on a different architecture, it will needlessly be a giant pain in the ass.
I was at IBM when we gave up on big endian for Power. Too much new code assumed LE, and we switched, despite the insane engineering effort (though TBH, that effort had the side effect of retaining some absolutely first-class engineers a few more years).
On Linux it's really as simple as installing QEMU binfmt and doing:
> When programming, it is still important to write code that runs correctly on systems with either byte order
I contend it's almost never important and almost nobody writing user software should bother with this. Certainly, people who didn't already know they needed big-endian should not start caring now because they read an article online. There are countless rare machines that your code doesn't run on--what's so special about big endian? The world is little endian now. Big endian chips aren't coming back. You are spending your own time on an effort that will never pay off. If big endian is really needed, IBM will pay you to write the s390x port and they will provide the machine.
presented at Embedded Linux Conf
Of course the endianness only matters to C programmers who take endless pleasure in casting raw data from external sources into structs.