The GNU libc atanh is correctly rounded

[−] jcranmer 27d ago

One of the major projects that's ongoing in the current decade is moving the standard math library functions to fully correctly-rounded, as opposed to the traditional accuracy target of ~1 ULP (the last bit is off).

For single-precision unary functions, it's easy enough to just exhaustively test every single input (there's only 4 billion of them). But double precision has prohibitively many inputs to test, so you have to resort to actual proof techniques to prove correct rounding for double-precision functions.

[−] WalterGR 27d ago

> traditional accuracy target of ~1 ULP

I had to google this one…

ULP: “Unit in the Last Place” or “Unit of Least Precision: https://en.wikipedia.org/wiki/Unit_in_the_last_place

[−] mananaysiempre 27d ago

For what it’s worth, this is basically the first word you learn when discussing numerical precision; and I mean word—nobody thinks of it as an abbreviation, to the point that it’s very often written in lower case. So welcome to the club.

[−] incognito124 26d ago

If only we switched to ternary, there rounding is simply truncating

[−] adgjlsfhk1 27d ago

to me this feels like wasted effort due to solving the wrong problem. The extra half ulp error makes no difference to the accuracy of calculations. the problem is that languages traditionally rely on an OS provided libm leading to cross architecture differences. If instead, languages use a specific libm, all of these problems vanish.

[−] lifthrasiir 27d ago

Standardizing a particular libm essentially locks any further optimizations because that libm's implementation quirks have to be exactly followed. In comparison the "most correct" (0.5 ulp) answer is easy to standardize and agree upon.

[−] SideQuark 27d ago

> The extra half ulp error makes no difference to the accuracy of calculations

It absolutely does matter. The first, and most important reason, is one needs to know the guarantees of every operation in order to design numerical algorithms that meet some guarantee. Without knowing that the components provide, it's impossible to design algorithms on top with some guarantee. And this is needed in a massive amount of applications, from CAD, simulation, medical and financial items, control items, aerospace, and on and on.

And once one has a guarantee, making the lower components tighter allows higher components to do less work. This is a very low level component, so putting the guarantees there reduces work for tons of downstream work.

All this is precisely what drove IEEE 754 to become a thing and to become the standard in modern hardware.

> the problem is that languages traditionally rely on an OS provided libm leading to cross architecture differences

No, they don't not things like sqrt and atanh and related. They've relied on compiler provided libs since, well, as long as there have been languages. And the higher level libs, like BLAS, are built on specific compilers that provide guarantees by, again, libs the compiler used. I've not seen OS level calls describing the accuracy of the floating point items, but a lot of languages do, including C/C++ which underlies a lot of this code.

[−] adgjlsfhk1 26d ago

> The first, and most important reason, is one needs to know the guarantees of every operation in order to design numerical algorithms that meet some guarantee

sure, but a 1 ulp guarantee works just as well here while being substantially easier to provide.

> And the higher level libs, like BLAS, are built on specific compilers that provide guarantees

Sure, but Blas doesn't provide any accuracy guarantees so it being built on components that sort of do has pretty minimal value for it. For basically any real application, the error you experience is error from the composition of intrinsics, not the composed error of those intrinsic themselves, and that remains true even if those intrinsics have 10 ULP error or 0.5 ULP error

[−] fweimer 27d ago

Many of the conversions so far have been clearly faster. I don't think anything has been merged which shows a clear performance regression, at least not on CPUs with FMA support.

[−] gajjanag 27d ago

The bigger challenge is GPU/NPU. Branches for fast vs accurate path get costlier, among other things. On CPU this is less of a cost.

Most published libm on GPU/NPU side have a few ULP of error for the perf vs accuracy tradeoff. Eg, documented explicitly in the CUDA programming guide: https://docs.nvidia.com/cuda/cuda-programming-guide/05-appen... .

Prof. Zimmermann and collaborators have a great table at https://members.loria.fr/PZimmermann/papers/accuracy.pdf (Feb 2026) comparing various libm wrt accuracy.

[−] adgjlsfhk1 27d ago

using fma makes it possible to write faster libm functions, but going back to a 1 ulp world with the same fma optimizations would give you another 20% speedup at least. the other issue is that these functions tend to have much larger code size which tends not to be a significant problem in micro benchmarks, but means that in real applications you increase cache pressure allowing things down in aggregate

[−] ghighi7878 27d ago

Mixed precision computations need correctly rounded functions.

[−] adgjlsfhk1 27d ago

no they don't... why would they?

[−] RyJones 27d ago

Interesting: https://youtu.be/cb5r3r38O9c

Guy's world records get deleted due to changes in atanh over time

[−] kergonath 27d ago

I don’t think I ever used atanh, but I always love some floating-point nerdery. These other documents by the same team are fantastic resources: https://inria.hal.science/hal-04714173v2/document for complex values and https://members.loria.fr/PZimmermann/papers/accuracy.pdf for real values.

Lots of good stuff here: https://members.loria.fr/PZimmermann/papers/ .

[−] RandomTeaParty 27d ago

Why not arxiv?

[−] jonathrg 27d ago

Good to know!

[−] brcmthrowaway 27d ago

Who wrote it? Someone at Red Hat likely.

The GNU libc atanh is correctly rounded (inria.hal.science)

29 comments