Herbie: Automatically improve imprecise floating point formulas (herbie.uwplse.org)

by summarity 45 comments 198 points
Read article View on HN

45 comments

[−] summarity 41d ago
I posted this and it picked up steam over night, so I thought I'd add how I'm using it:

I work on 3D/4D math in F#. As part of the testing strategy for algorithms, I've set up a custom agent with an F# script that instruments Roslyn to find FP and FP-in-loop hotspots across the codebase.

The agent then reasons through the implementation and writes core expressions into an FPCore file next to the existing tests, running several passes, refining the pres based on realistic caller input. This logs Herbie's proposed improvements as output FPCore transformations. The agent then reasons through solutions (which is required, Herbie doesn't know algorithm design intent, see e.g. this for a good case study: https://pavpanchekha.com/blog/herbie-rust.html), and once convinced of a gap, creates additional unit tests and property tests (FsCheck/QuickCheck) to prove impact. Then every once in a while I review a batch to see what's next.

Generally there are multiple types of issues that can be flagged:

a) Expression-level imprecision over realistic input ranges: this is Herbie's core strength. Usually this catches "just copied the textbook formula" instance of naive math. Cancellation, Inf/NaN propagation, etc. The fixes are consistently using fma for accumulation, biggest-factor scaling to prevent Inf, hypot use, etc.

b) Ill-conditioned algorithms. Sometimes the text books lie to you, and the algorithms themselves are unfit for purpose, especially in boundary regions. If there are multiple expressions that have a <60% precision and only a 1 to 2% improvement across seeds, it's a good sign the algo is bad - there's no form that adequately performs on target inputs.

c) Round-off, accumulation errors. This is more a consequence of agent reasoning, but often happens after an apparent "100% -> 100%" pass. The agent is able to, via failing tests, identify parts of an algorithm that can benefit from upgrading the context to e.g. double-word arithmetic for additional precision.

[−] Archit3ch 41d ago

> I work on 3D/4D math in F#. As part of the testing strategy for algorithms, I've set up a custom agent with an F# script that instruments Roslyn to find FP and FP-in-loop hotspots across the codebase.

I don't know if there is an equivalent in Roslyn, but in Julia you can have the agent inspect the LLVM output to surface problems in hot loops.

[−] yread 41d ago
Really cool, numerical stability can be tricky as errors can accumulate with each operation and suddenly 53 bits of precision is not enough.

Also nice to see an article thats not about AI or politics

[−] urschrei 41d ago
I resurrected the Rust Herbie lint (now using dylint) a while ago: https://github.com/urschrei/herbie-lint
[−] notpushkin 41d ago
Some trivial cases produce... interesting results.

For x in [−1.79e308, 1.79e308]:

Initial Program: 100.0% accurate, 1.0× speedup

  def code(x):
      return math.sqrt((x + 1.0))
Alternative 1: 67.5% accurate, 5.6× speedup

  def code(x):
      return 1.0
[−] lifthrasiir 41d ago
That does make sense, because a half of all available fp numbers are less than 1 in their magnitude. In particular there should be a plenty of numbers x such that |x| << 1 so x + 1 ~= 1; in fact, the proportion should be just shy of 50%.