When you have so few bits, does it really make sense to invent a meaning for the bit positions? Just use an index into a "palette" of pre-determined numbers.
As a bonus, any operation can be replaced with a lookup into a nxn table.
In standard FP32, the infs are represented as a sign bit, all exponent bits=1, and all mantissa bits=0. The NaNs are represented as a sign bit, all exponent bits=1, and the mantissa is non-zero. If you used that interpretation with FP4, you'd get the table below, which restricts the representable range to +/- 3, and it feels less useful to me. If you're using FP4 you probably are space optimized and don't want to waste a quarter of your possible combinations on things that aren't actually numbers, and you'd likely focus your efforts on writing code that didn't need to represent inf and NaN.
There is a relevant Wikipedia page about minifloats [0]
> The smallest possible float size that follows all IEEE principles, including normalized numbers, subnormal numbers, signed zero, signed infinity, and multiple NaN values, is a 4-bit float with 1-bit sign, 2-bit exponent, and 1-bit mantissa.
> In ancient times, floating point numbers were stored in 32 bits.
This was true only for cheap computers, typically after the mid sixties.
Most of the earliest computers with vacuum tubes used longer floating-point number formats, e.g. 48-bit, 60-bit or even weird sizes like 57-bit.
The 32-bit size has never been acceptable in scientific computing with complex computations where rounding errors accumulate. The early computers with floating-point hardware were oriented to scientific/technical computing, so bigger number sizes were preferred. The computers oriented to business applications usually preferred fixed-point numbers.
The IBM System/360 family has definitively imposed the 32-bit single-precision and 64-bit double-precision sizes, where 32-bit is adequate for input data and output data and it can be sufficient for intermediate values when the input data passes through few computations, while otherwise double-precision must be used.
There's an "Update:" note for a next post on NF4 format. As far as I can tell this is neither NVFP4 nor MXFP4 which are commonly used with LLM model files. The thing with these formats is that common information is separated in batches so not a singular format but a format for groups of values. I'd like to know more about these (but not enough to go research them myself).
I too want fewer bits of mantissa in my floating point!
But what I wish is that there had been fp64 encoding with a field for number of significant digits.
strtod() would encode this, fresh out of an instrument reading (serial). It would be passed along. It would be useful EVEN if it weren't updated by arithmetic with other such numbers.
Every day I get a query like "why does the datum have so many decimal digits? You can't possibly be saying that the instrument is that precise!"
Well, it's because of sprintf(buf, "%.16g", x) as the default to CYA.
Also sad is the complaint about "0.56000 ... 01" because someone did sprintf("%.16f").
I can't fix this in one class -- data travels between too many languages and communication buffers.
In short, I wish I had an fp64 double where the last 4 bits were ALWAYS left alone by the CPU.
> In ancient times, floating point numbers were stored in 32 bits.
I thought in ancient times, floating point numbers used to be 80 bit. They lived in a funky mini stack on the coprocessor (x87). Then one day, somebody came along and standardized those 32 and 64 bit floats we still have today.
79 comments
As a bonus, any operation can be replaced with a lookup into a nxn table.
It seems quite wastful to have two zeros when you only have 4 bits it total
> The smallest possible float size that follows all IEEE principles, including normalized numbers, subnormal numbers, signed zero, signed infinity, and multiple NaN values, is a 4-bit float with 1-bit sign, 2-bit exponent, and 1-bit mantissa.
[0] https://en.wikipedia.org/wiki/Minifloat
> In ancient times, floating point numbers were stored in 32 bits.
This was true only for cheap computers, typically after the mid sixties.
Most of the earliest computers with vacuum tubes used longer floating-point number formats, e.g. 48-bit, 60-bit or even weird sizes like 57-bit.
The 32-bit size has never been acceptable in scientific computing with complex computations where rounding errors accumulate. The early computers with floating-point hardware were oriented to scientific/technical computing, so bigger number sizes were preferred. The computers oriented to business applications usually preferred fixed-point numbers.
The IBM System/360 family has definitively imposed the 32-bit single-precision and 64-bit double-precision sizes, where 32-bit is adequate for input data and output data and it can be sufficient for intermediate values when the input data passes through few computations, while otherwise double-precision must be used.
> Programmers were grateful for the move from 32-bit floats to 64-bit floats. It doesn’t hurt to have more precision
Someome didn't try it on GPU...
> The notation ExMm denotes a format with x exponent bits and y mantissa bits.
Shouldn't that be m mantissa bits (not y) -- i.e. typo here -- or am I misunderstanding something?
It seems that life is imitating art.
https://github.com/sdd/ieee754-rrp
But what I wish is that there had been fp64 encoding with a field for number of significant digits.
strtod() would encode this, fresh out of an instrument reading (serial). It would be passed along. It would be useful EVEN if it weren't updated by arithmetic with other such numbers.
Every day I get a query like "why does the datum have so many decimal digits? You can't possibly be saying that the instrument is that precise!"
Well, it's because of sprintf(buf, "%.16g", x) as the default to CYA.
Also sad is the complaint about "0.56000 ... 01" because someone did sprintf("%.16f").
I can't fix this in one class -- data travels between too many languages and communication buffers.
In short, I wish I had an fp64 double where the last 4 bits were ALWAYS left alone by the CPU.
> In ancient times, floating point numbers were stored in 32 bits. Then somewhere along the way 64 bits became standard.
I think Cray doubles were 128 bits, and their singles were 64… which makes it seem like smaller floats are just a continuation of the eternal trend.
> In ancient times, floating point numbers were stored in 32 bits.
I thought in ancient times, floating point numbers used to be 80 bit. They lived in a funky mini stack on the coprocessor (x87). Then one day, somebody came along and standardized those 32 and 64 bit floats we still have today.