Herbie: Automatically improve imprecise floating point formulas (herbie.uwplse.org)

by summarity 45 comments 198 points
Read article View on HN

45 comments

[−] summarity 41d ago
I posted this and it picked up steam over night, so I thought I'd add how I'm using it:

I work on 3D/4D math in F#. As part of the testing strategy for algorithms, I've set up a custom agent with an F# script that instruments Roslyn to find FP and FP-in-loop hotspots across the codebase.

The agent then reasons through the implementation and writes core expressions into an FPCore file next to the existing tests, running several passes, refining the pres based on realistic caller input. This logs Herbie's proposed improvements as output FPCore transformations. The agent then reasons through solutions (which is required, Herbie doesn't know algorithm design intent, see e.g. this for a good case study: https://pavpanchekha.com/blog/herbie-rust.html), and once convinced of a gap, creates additional unit tests and property tests (FsCheck/QuickCheck) to prove impact. Then every once in a while I review a batch to see what's next.

Generally there are multiple types of issues that can be flagged:

a) Expression-level imprecision over realistic input ranges: this is Herbie's core strength. Usually this catches "just copied the textbook formula" instance of naive math. Cancellation, Inf/NaN propagation, etc. The fixes are consistently using fma for accumulation, biggest-factor scaling to prevent Inf, hypot use, etc.

b) Ill-conditioned algorithms. Sometimes the text books lie to you, and the algorithms themselves are unfit for purpose, especially in boundary regions. If there are multiple expressions that have a <60% precision and only a 1 to 2% improvement across seeds, it's a good sign the algo is bad - there's no form that adequately performs on target inputs.

c) Round-off, accumulation errors. This is more a consequence of agent reasoning, but often happens after an apparent "100% -> 100%" pass. The agent is able to, via failing tests, identify parts of an algorithm that can benefit from upgrading the context to e.g. double-word arithmetic for additional precision.

[−] Archit3ch 41d ago

> I work on 3D/4D math in F#. As part of the testing strategy for algorithms, I've set up a custom agent with an F# script that instruments Roslyn to find FP and FP-in-loop hotspots across the codebase.

I don't know if there is an equivalent in Roslyn, but in Julia you can have the agent inspect the LLVM output to surface problems in hot loops.

[−] yread 42d ago
Really cool, numerical stability can be tricky as errors can accumulate with each operation and suddenly 53 bits of precision is not enough.

Also nice to see an article thats not about AI or politics

[−] urschrei 41d ago
I resurrected the Rust Herbie lint (now using dylint) a while ago: https://github.com/urschrei/herbie-lint
[−] notpushkin 42d ago
Some trivial cases produce... interesting results.

For x in [−1.79e308, 1.79e308]:

Initial Program: 100.0% accurate, 1.0× speedup

  def code(x):
      return math.sqrt((x + 1.0))
Alternative 1: 67.5% accurate, 5.6× speedup

  def code(x):
      return 1.0
[−] lifthrasiir 42d ago
That does make sense, because a half of all available fp numbers are less than 1 in their magnitude. In particular there should be a plenty of numbers x such that |x| << 1 so x + 1 ~= 1; in fact, the proportion should be just shy of 50%.
[−] andrepd 41d ago
That is indeed one of the problems with IEEE floats. There are only 10^80 atoms in the universe, and a Planck length is 1^-60th of the radius of the universe. But 64-bit floats have an absurd range of over 10^±300! Worse than that, notice that there are as many bit patterns in the never-used range between 10^300 and 10^301 as there are in the super-important range between 1 and 10! Super wasteful. Not to mention the quadrillions of values reserved to represent "NaN"...

This is one of the problems that alternative formats such as the Posit aim to solve. It's quite interesting: I've got an implementation in rust here if you want to play with it https://github.com/andrepd/posit-rust

[−] AlotOfReading 41d ago
Note that the logarithmic distribution of float density is also key to certain kinds of efficient hardware float implementations, because it means you can use the fixed mantissa bits alone as table indices. Unums have proven difficult to build efficient HW implementations for.

IEEE floats have a few warts like any other 1980s standard, but they're a fantastic design.

[−] andrepd 41d ago

> Unums have proven difficult to build efficient HW implementations for

Valid point, but not quite true anymore. It comes down basically to the latency of count_leading_ones/zeros for decoding the regime, on which everything else depends. But work has been done in the past ~2ish years and we can have posit units with lower latency than FP units of the same width! https://arxiv.org/abs/2603.01615

> IEEE floats have a few warts like any other 1980s standard, but they're a fantastic design.

Hmm I don't know if I would call it a fantastic design x) The "standard" is less a standard than a rough formalisation of a specific FPU design from back in the 1980s, and that design was in turn not really the product of a forward thinking visionary but something to fit the technical and business constraints of that specific piece of hardware.

It has more than a few warts and we can probably do much better nowadays. That's not really a diss on IEEE floats or their designers, it's just a matter of fact (which honestly applies to very many things which are 40 years old, let alone those designed under the constraints of IEEE754).

[−] AlotOfReading 41d ago
I'm sure you're much more knowledgeable about this than I am, but that's kind of my point. A month old preprint is the first thing to compare to implementations of a mildly evolved, warty old standard from 40 years ago. I consider that fantastic.

Thanks for the paper though. Looking forward to reading it more closely when I have time.

[−] andrepd 41d ago
Absolutely! IEEE floats are ubiquitous in software and hardware. I bet good money they will still be around 40 more years into the future :) and any alternative has to fight their ubiquity.

Better alternatives have been proposed for a long time though. Posits are very nice, and even they are almost 10 years old now :p

[−] cocoto 42d ago
But I guess using the density distribution of floating points is rarely useful in a problem. Your actual distribution will almost surely be way different. Imo, the tool presented here should provide a way to manually provide a custom density function (with some common presets like uniform and normal distributions).
[−] pavpanchekha 41d ago
Author here! Yes, the float distribution isn't what you want in practice, but distribution selector isn't really the right thing either, because a low probability bad result can still be pretty bad! Hence the range selector; the float distribution is good at picking extreme values that trigger FP error.

We usually recommend looking for 90%+ accuracy or carefully examining the accuracy plot

[−] FabHK 41d ago
Not really. 1+x/2, however, would be a good approximation to sqrt(1+x) for small (in absolute value) x.
[−] irchans 41d ago
How about:

(* Mathematica Notation, Assume x>0 ) If[ x < 10^(-10), 1+ x/2, ( order x^2 error ) If[ x> 10^10, Sqrt[x], ( order 1/Sqrt[x] error ) (else) Sqrt[x +1] ] ] ( I guess the If statements take too much time. *)

[−] notpushkin 38d ago
HN formatter ate up the asterisks in your code comments :) To paste code blocks here, prepend each line with 2 or more spaces:

  (* Mathematica Notation, Assume x>0 *)
  If[ x < 10^(-10),  1 + x/2,   (* order x^2 error *)
  If[ x > 10^10,     Sqrt[x],   (* order 1/Sqrt[x] error *)
  (* else *)         Sqrt[x+1] ] ]
  
  (* I guess the If statements take too much time. *)
[−] fluorinerocket 42d ago
How useful is this when you are using numbers in a reasonable range, like 10^-12 to 10^12? Generally I try to scale my numbers to be in this range, whether by picking the right units or scaling constraints and objectives when doing nonlinear programming/ optimization.

Like looking at this example,

https://herbie.uwplse.org/demo/b070b371a661191752fe37ce0321c...

It is claimed that for the function f(x) =sqrt(x+1) -1

Accuracy is increased by from 8.5% accuracy to 98% for alternative 5 Which has f(x) = 0.5x

Ok so x=99, the right answer is sqrt(100) -1 = 9

But 0.5 * 99 = 49.5 which doesn't seem too accurate to me.

[−] bee_rider 42d ago
I wonder, is there a way to only request reformulations that don’t involve branches? The tool already seems quite nice, but that might be a good feature.

Also, I’m not sure I understand the speedup. Is it latency or throughput?

[−] reikonomusha 42d ago
This is an awesome piece of software, one of my favorite little pieces of magic. Finding more precise or more stable floating point formulas is often arduous and requires a lot of familiarity with the behavior of floats. This finds good formulas completely automatically. Super useful for numerical computation.
[−] shiandow 41d ago
I wonder who decided to use a step function for the speed accuracy plot. They must have thought the convex hull would be wrong because you can't really make linear combinations of algorithms (you could, but you'd have to use time not speed to make it linear). So I get why you would use step functions, but the step is the wrong way around. The current plot suggests accuracy doesn't drop if you need higher speeds
[−] fouronnes3 41d ago
Working with tensor datatypes in numerical computing, I've been wondering if it would be possible to somehow add an extra dimension to tensors that would serve as the "floating point precision" dimension, instead of a data type. After all why couldn't the bit depth be one of the tensor dims? Maybe it would be possible to implement arbitrary floating point precision that way?
[−] fouronnes3 41d ago
This is very interesting! I made an interval arithmetic calculator a few months ago [0]. I wonder if you could combine the two techs to make something even more powerful.

[0] https://victorpoughon.github.io/interval-calculator/

[−] andrepd 41d ago
I don't quite understand how they define "accuracy".
[−] Archit3ch 41d ago
Thanks for reminding me to try this for audio approximations. :)
[−] lutusp 42d ago
[flagged]