4-bit floating point FP4

[−] teo_zero 26d ago

When you have so few bits, does it really make sense to invent a meaning for the bit positions? Just use an index into a "palette" of pre-determined numbers.

As a bonus, any operation can be replaced with a lookup into a nxn table.

[−] petters 25d ago

That's a good idea and it exists: https://www.johndcook.com/blog/2026/04/18/qlora/

It seems quite wastful to have two zeros when you only have 4 bits it total

[−] saulpw 25d ago

OTOH, it seems quite plausible that the most important numbers to represent are:

   +0
   -0
   +1
   -1
   +inf
   -inf

[−] parsimo2010 25d ago

In standard FP32, the infs are represented as a sign bit, all exponent bits=1, and all mantissa bits=0. The NaNs are represented as a sign bit, all exponent bits=1, and the mantissa is non-zero. If you used that interpretation with FP4, you'd get the table below, which restricts the representable range to +/- 3, and it feels less useful to me. If you're using FP4 you probably are space optimized and don't want to waste a quarter of your possible combinations on things that aren't actually numbers, and you'd likely focus your efforts on writing code that didn't need to represent inf and NaN.

  Bits s exp m  Value
  -------------------
  0000 0  00 0     +0
  0001 0  00 1   +0.5
  0010 0  01 0     +1
  0011 0  01 1   +1.5
  0100 0  10 0     +2
  0101 0  10 1     +3
  0110 0  11 0     +inf
  0111 0  11 1     NaN
  1000 1  00 0     -0
  1001 1  00 1   -0.5
  1010 1  01 0     -1
  1011 1  01 1   -1.5
  1100 1  10 0     -2
  1101 1  10 1     -3
  1110 1  11 0     -inf
  1111 1  11 1     NaN

[−] saulpw 24d ago

I can see the most important values being:

   ± 0 (infinitesimal)
   ± 10^-2n
   ± 10^-n
   ± 1 (unity)
   ± 10^n
   ± 10^2n
   ± infinity

For fp4, this leaves 2 values. Maybe one of them should be NaN. What should the other one be?

[−] Dwedit 25d ago

Why waste a slot on -0?

[−] 0-_-0 26d ago

You want to make multiplication cheap, it's not just about compression

[−] childintime 26d ago

Exactly. And pick them on the e^x curve.

[−] conaclos 26d ago

There is a relevant Wikipedia page about minifloats [0]

> The smallest possible float size that follows all IEEE principles, including normalized numbers, subnormal numbers, signed zero, signed infinity, and multiple NaN values, is a 4-bit float with 1-bit sign, 2-bit exponent, and 1-bit mantissa.

[0] https://en.wikipedia.org/wiki/Minifloat

[−] adrian_b 25d ago

> In ancient times, floating point numbers were stored in 32 bits.

This was true only for cheap computers, typically after the mid sixties.

Most of the earliest computers with vacuum tubes used longer floating-point number formats, e.g. 48-bit, 60-bit or even weird sizes like 57-bit.

The 32-bit size has never been acceptable in scientific computing with complex computations where rounding errors accumulate. The early computers with floating-point hardware were oriented to scientific/technical computing, so bigger number sizes were preferred. The computers oriented to business applications usually preferred fixed-point numbers.

The IBM System/360 family has definitively imposed the 32-bit single-precision and 64-bit double-precision sizes, where 32-bit is adequate for input data and output data and it can be sufficient for intermediate values when the input data passes through few computations, while otherwise double-precision must be used.

[−] chrisjj 26d ago

> Programmers were grateful for the move from 32-bit floats to 64-bit floats. It doesn’t hurt to have more precision

Someome didn't try it on GPU...

[−] Figs 26d ago

> The notation ExMm denotes a format with x exponent bits and y mantissa bits.

Shouldn't that be m mantissa bits (not y) -- i.e. typo here -- or am I misunderstanding something?

[−] sc0ttyd 26d ago

9 years ago, I shared this as an April Fools joke here on HN.

It seems that life is imitating art.

https://github.com/sdd/ieee754-rrp

[−] nivertech 26d ago

FP2 spec:

  00 -> 0.0
  01 -> 1.0
  10 -> Inf
  11 -> NaN

or

  00 -> 0.0
  01 -> 1.0
  10 -> Inf
  11 -> -Inf

[−] karmakaze 26d ago

There's an "Update:" note for a next post on NF4 format. As far as I can tell this is neither NVFP4 nor MXFP4 which are commonly used with LLM model files. The thing with these formats is that common information is separated in batches so not a singular format but a format for groups of values. I'd like to know more about these (but not enough to go research them myself).

[−] FarmerPotato 25d ago

I too want fewer bits of mantissa in my floating point!

But what I wish is that there had been fp64 encoding with a field for number of significant digits.

strtod() would encode this, fresh out of an instrument reading (serial). It would be passed along. It would be useful EVEN if it weren't updated by arithmetic with other such numbers.

Every day I get a query like "why does the datum have so many decimal digits? You can't possibly be saying that the instrument is that precise!"

Well, it's because of sprintf(buf, "%.16g", x) as the default to CYA.

Also sad is the complaint about "0.56000 ... 01" because someone did sprintf("%.16f").

I can't fix this in one class -- data travels between too many languages and communication buffers.

In short, I wish I had an fp64 double where the last 4 bits were ALWAYS left alone by the CPU.

[−] bee_rider 25d ago

> In ancient times, floating point numbers were stored in 32 bits. Then somewhere along the way 64 bits became standard.

I think Cray doubles were 128 bits, and their singles were 64… which makes it seem like smaller floats are just a continuation of the eternal trend.

[−] ant6n 26d ago

> In ancient times, floating point numbers were stored in 32 bits.

I thought in ancient times, floating point numbers used to be 80 bit. They lived in a funky mini stack on the coprocessor (x87). Then one day, somebody came along and standardized those 32 and 64 bit floats we still have today.

4-bit floating point FP4 (johndcook.com)

79 comments