Clearly India history has a lot to say about mathematics but I don't think they get enough attention. Or am I just ignorant and living in my own bubble? Indian philosophy is also very intriguing.
The coefficients given are indeed a near-optimal cubic minimax approximation for (π/2 - arcsin(x))/sqrt(1-x) on [0,1]. But those coefficients aren't actually optimal for approximating arcsin(x) itself.
For reference, the coefficients given are [1.5707288, -0.2121144, 0.0742610, -0.0187293]: if we optimize P(x) = (π/2 - arcsin(x))/sqrt(1-x) ourselves, we can extend them to double precision as [1.5707288189560218, -0.21211524058527342, 0.0742623449400704, -0.018729868776598532]. Increasing the precision reduces the max error, at x = 0, by 0.028%.
Adjusting our error function to optimize the absolute error of arcsin(x) = π/2 - P(x)*sqrt(1-x) on [0,1], we get the coefficients [1.5707583404833712, -0.2128751841625164, 0.07689738736091772, -0.02089203710669022]. The max error is reduced by 44%, from 6.75e-5 to 3.80e-5. If we plot the error function [0], we see that the new max error is achieved at five points, x = 0, 0.105, 0.386, 0.730, 0.967.
(Alternatively, adjusting our error function to optimize the relative error of arcsin(x), we get the coefficients [1.5707963267948966, -0.21441792645252514, 0.08365774237116316, -0.02732304481232744]. The max absolute error is 2.24e-4, but the max relative error is now 0.0181%, even in the vicinity of the root at x = 0. Though we'd almost certainly want to use a different formula to avoid catastrophic cancellation.)
So it goes to show, we can nearly double our accuracy, without modifying the code, just by optimizing for the right error metric.
It makes zero sense to measure performance without measuring correctness. Especially, when you use LLM. Here is even faster asin(): return 0;
This precisions should be measured in avg and worst ULP, not in "charts". A good approximation should also give exact results in critical points (-1/0/-1 in this case).
The "faster" version gives this:
asin(0) = 6.75268e-05 (double precision)
Which gives around 5e+15 ULPs, while common libc math implementations targets 19 ULPs (but will be 0 for asin(0)).
Cool, although more ILP (instruction-level parallelism) might not necessarily be better on a modern GPU, which doesn't have much ILP, if any (instead it uses those resources to execute several threads in parallel).
That might explain why the original Cg (a GPU programming language) code did not use Estrin's, since at least the code in the post does add 1 extra op (squaring abs_x).
(AMD GPUs used to use VLIW (very long instruction word) which is "static" ILP).
Did you try polynomial preprocessing methods, like Knuth's and Estrin's methods?
https://en.wikipedia.org/wiki/Polynomial_evaluation#Evaluati...
they let you compute polynomials with half the multiplications of Horner's method, and I used them in the past to improve the speed of the exponential function in Boost.
64 comments
This tradition came to its own as Madhava school of mathematics from Kerala. https://en.wikipedia.org/wiki/Kerala_school_of_astronomy_and...
Note the approximation is for 0 < x < 1. For the range [-1, 0] Bhaskara used symmetry.
If I remember correctly, Aryabhatta had derived a rational approximation about a hundred years before this.
EDIT https://doi.org/10.4169/math.mag.84.2.098
For reference, the coefficients given are [1.5707288, -0.2121144, 0.0742610, -0.0187293]: if we optimize P(x) = (π/2 - arcsin(x))/sqrt(1-x) ourselves, we can extend them to double precision as [1.5707288189560218, -0.21211524058527342, 0.0742623449400704, -0.018729868776598532]. Increasing the precision reduces the max error, at x = 0, by 0.028%.
Adjusting our error function to optimize the absolute error of arcsin(x) = π/2 - P(x)*sqrt(1-x) on [0,1], we get the coefficients [1.5707583404833712, -0.2128751841625164, 0.07689738736091772, -0.02089203710669022]. The max error is reduced by 44%, from 6.75e-5 to 3.80e-5. If we plot the error function [0], we see that the new max error is achieved at five points, x = 0, 0.105, 0.386, 0.730, 0.967.
(Alternatively, adjusting our error function to optimize the relative error of arcsin(x), we get the coefficients [1.5707963267948966, -0.21441792645252514, 0.08365774237116316, -0.02732304481232744]. The max absolute error is 2.24e-4, but the max relative error is now 0.0181%, even in the vicinity of the root at x = 0. Though we'd almost certainly want to use a different formula to avoid catastrophic cancellation.)
So it goes to show, we can nearly double our accuracy, without modifying the code, just by optimizing for the right error metric.
[0] https://www.desmos.com/calculator/nj3b8rpvbs
Let α represent a roll rotation, and β a pitch rotation.
Let R(α) be:
Let R(β) be: Combine them: But! For small α and β, just approximate: So now: [1]https://news.ycombinator.com/item?id=47348192return 0;This precisions should be measured in avg and worst ULP, not in "charts". A good approximation should also give exact results in critical points (-1/0/-1 in this case).
The "faster" version gives this:
Which gives around 5e+15 ULPs, while common libc math implementations targets 19 ULPs (but will be 0 for asin(0)).That might explain why the original Cg (a GPU programming language) code did not use Estrin's, since at least the code in the post does add 1 extra op (squaring
abs_x).(AMD GPUs used to use VLIW (very long instruction word) which is "static" ILP).
> It also gets in the way of elegance and truth.
That’s quite subjective. I happen to find trigonometry to be elegant and true.
I also agree that trigonometric functions lack efficiency in software.
atanfunction. Sin is almost a lookup query.