These processors will have very decent performance for many applications, very similar to that of AWS Graviton5, but with more cores per socket.
However, the claim made by Arm: "the Arm AGI CPU, for agentic AI infrastructure, delivering more than 2x performance per rack compared with x86 platforms" is obviously false.
The new Intel Clearwater Forest Xeon processors use Darkmont cores, which have approximately the same performance per core, the same die area per core and the same power consumption per core as the Neoverse V3, but Intel offers 288 cores per socket and 576 cores per board, in comparison with only 136 cores per socket for Arm.
Therefore there is no chance that these new Arm processors can provide more performance per rack than Intel Clearwater Forest.
For applications that benefit from array operations, the AMD Zen 5 compact cores have much more performance per core than Neoverse V3 and AMD has provided 192 cores per socket for a long time. There is no chance for the new processors to exceed the performance per rack of Zen 5, but for those applications that do not benefit from array operations, these new Arm CPUs should have better performance per watt than Zen 5. But by the end of the year AMD should have Zen 6 Epyc CPUs, with more cores per socket, enhanced performance per core and improved performance per watt, so then there would be even less opportunities for these Arm CPUs to be better at something.
The only way how the claim of Arm can be true is if they have compared their new CPUs with antiquated CPUs like the Intel Granite Rapids Xeon CPUs, instead of comparing with state-of-the-art Intel Clearwater Forest and AMD Zen 5.
> The new Intel Clearwater Forest Xeon processors use Darkmont cores, which have approximately the same performance per core, the same die area per core and the same power consumption per core as the Neoverse V3
In no world will Darkmont perform like Neoverse V3 / Cortex-X4. Darkmont is much slower.
Arm's Neoverse V3 136-core CPU has a 3.2 GHz base clock, so the exact same as this Cortex-X4. Your real problem arises when a 288C Clearwater Forest CPU at the highest 500W TDP means a maximum of 1.7W per core (generous, as we're excluding uncore, fabric, cache, etc.). It's probably closer to 1.5W, but let's be generous and toss in +200mW.
Darkmont will be *nowhere* near 3.5 GHz at a mere 1.7W / core power budget. It'll be much closer to 2 GHz. Sierra Forest (6780E) is 144 cores @ 350W (2.2W / core) → a pitiful 2.2 GHz base clock. Let's go crazy and assume Darkmont magically achieves +13% higher clocks (2.2 → 2.5 GHz) at 22% less power (2.2W per core → 1.7W per core) and much higher IPC.
Darkmont @ hypothetical 2.5 GHz = ~5.09 points
Neoverse-V3 @ 3.2 GHz would be 61% faster.
>The only way how the claim of Arm can be true is if they have compared their new CPUs with antiquated CPUs like the Intel Granite Rapids Xeon CPUs, instead of comparing with state-of-the-art Intel Clearwater Forest and AMD Zen 5.
Intel had a paper announcement of Clearwater Forest this month. They have not revealed SKUs: no clocks, no model numbers, nothing exists. Nobody—including Arm—will be benchmarking against a CPU that doesn't exist on the market yet.
Aren't Intel Xeon Rapids and Intel Xeon Forest just different target markets? Rapids has fewer but faster cores in general, and more special-purpose accelerators (e.g. AMX, QAT), while Forest is focused on maximum compute density (just pack in as many fast-enough cores as you can).
IIRC Granite Rapids is also not _that_ old, and either current or a single generation behind. (Has its successor landed yet? IIRC GNR is the same generation as Sierra Forest).
Validating a core to server standards takes significantly longer.
V4 cores should be out this year using X925 and C1 Ultra-based V5 will probably be 2027-2028.
I suspect that X4 is already fast enough to beat EPYC in per-core performance when using the whole chip. ARM caught up/passed x86 in IPC all the way back around A77/78 in 2019-2020. They are now much faster per clock and hitting about the same all-core clockspeeds as standard EPYC (let alone zen5c EPYC).
The big issue is that Graviton5 is already starting to hit the market and uses the same v3 cores. A lot of marketshare for this chip will probably come from taking Ampere customers.
Cortex-X4 a.k.a. Neoverse V3 has significantly lower performance per core than Zen 5.
However, Neoverse V3 has a lower die area, so you could implement more cores per socket than with Zen 5, but this has not been done yet, as these new CPUs have only 136 cores per socket versus 192 cores per socket for Zen 5.
For programs that do not use array operations, i.e. which do not use AVX/AVX-512 instructions, Neoverse V3 has better performance per watt than Zen 5. But that changes for programs that benefit from AVX/AVX-512, where Zen 5 has better performance per watt.
Moreover, Zen 5 is already old. By the end of the year there will be Zen 6, which will be the real competitor for these new Arm CPUs, and Zen 6 will have better performance per watt, even more cores per socket and even more performance per core.
>Cortex-X4 a.k.a. Neoverse V3 has significantly lower performance per core than Zen 5.
I don't quite believe that, especially per core. In SPECint2017 from David Huang [1], Zen5 (HX 370) @ 5.1 GHz boost = 9.9 points, so Zen5 is approximately 1.94 points per GHz. But
Neoverse V3 (Cortex-X4) @ 3.2 GHz = 8.2 points, so V3 is approximately 2.56 points per GHz.
Arm 64C Neoverse V3 boosts to 3.7 GHz. AMD 64C Zen5 (9575F) boosts to 5 GHz. So this rough napkin mouth would show at maximum boost Neoverse V3 is right around maximum boost Zen5.
Zen5 fares much worse at base clocks, with Arm's 64C CPU offering +40% more SPECint perf per core than Zen5 because AMD downclocks to 3.3 GHz, but Arm is still up at 3.5 GHz + huge IPC advantage.
I think the only ARM licensee going for the hyperscaler CPU market is Ampere. Amazon and Microsoft make CPUs for themselves and Nvidia’s are aimed exclusively at AI workloads driving their GPUs.
They use ARM cores designed by the Arm company, but the complete chips are designed by AWS/Microsoft.
Ampere had previously used cores designed by Arm, but their latest CPUs (which do not impress much) use a custom core, like the Apple and Qualcomm CPUs.
I know, but I can't buy a Cobalt or Graviton workstation. Ampere has been the only way I could lay my hands on a nice workstation-grade ARM chip (unless you count Apple, but they also don't sell chips)
Not sure how to feel about this. Does this mean ARM is slowly moving from just licensing IP to actually competing with companies building on top of it?
This seems bad, doesn’t it? I already know that there has been friction between arm and their customers over higher licensing fees since the IPO just trying to put this in context.
39 comments
However, the claim made by Arm: "the Arm AGI CPU, for agentic AI infrastructure, delivering more than 2x performance per rack compared with x86 platforms" is obviously false.
The new Intel Clearwater Forest Xeon processors use Darkmont cores, which have approximately the same performance per core, the same die area per core and the same power consumption per core as the Neoverse V3, but Intel offers 288 cores per socket and 576 cores per board, in comparison with only 136 cores per socket for Arm.
Therefore there is no chance that these new Arm processors can provide more performance per rack than Intel Clearwater Forest.
For applications that benefit from array operations, the AMD Zen 5 compact cores have much more performance per core than Neoverse V3 and AMD has provided 192 cores per socket for a long time. There is no chance for the new processors to exceed the performance per rack of Zen 5, but for those applications that do not benefit from array operations, these new Arm CPUs should have better performance per watt than Zen 5. But by the end of the year AMD should have Zen 6 Epyc CPUs, with more cores per socket, enhanced performance per core and improved performance per watt, so then there would be even less opportunities for these Arm CPUs to be better at something.
The only way how the claim of Arm can be true is if they have compared their new CPUs with antiquated CPUs like the Intel Granite Rapids Xeon CPUs, instead of comparing with state-of-the-art Intel Clearwater Forest and AMD Zen 5.
> The new Intel Clearwater Forest Xeon processors use Darkmont cores, which have approximately the same performance per core, the same die area per core and the same power consumption per core as the Neoverse V3
In no world will Darkmont perform like Neoverse V3 / Cortex-X4. Darkmont is much slower.
SPECint2017
Darkmont @ 3.5 GHz boost = 7.13 points
Cortex-X4 @ 3.2 GHz = 8.20 points (+15% faster)
Source: David Huang, https://blog.hjc.im/spec-cpu-2017
Arm's Neoverse V3 136-core CPU has a 3.2 GHz base clock, so the exact same as this Cortex-X4. Your real problem arises when a 288C Clearwater Forest CPU at the highest 500W TDP means a maximum of 1.7W per core (generous, as we're excluding uncore, fabric, cache, etc.). It's probably closer to 1.5W, but let's be generous and toss in +200mW.
Darkmont will be *nowhere* near 3.5 GHz at a mere 1.7W / core power budget. It'll be much closer to 2 GHz. Sierra Forest (6780E) is 144 cores @ 350W (2.2W / core) → a pitiful 2.2 GHz base clock. Let's go crazy and assume Darkmont magically achieves +13% higher clocks (2.2 → 2.5 GHz) at 22% less power (2.2W per core → 1.7W per core) and much higher IPC.
Darkmont @ hypothetical 2.5 GHz = ~5.09 points
Neoverse-V3 @ 3.2 GHz would be 61% faster.
>The only way how the claim of Arm can be true is if they have compared their new CPUs with antiquated CPUs like the Intel Granite Rapids Xeon CPUs, instead of comparing with state-of-the-art Intel Clearwater Forest and AMD Zen 5.
Intel had a paper announcement of Clearwater Forest this month. They have not revealed SKUs: no clocks, no model numbers, nothing exists. Nobody—including Arm—will be benchmarking against a CPU that doesn't exist on the market yet.
IIRC Granite Rapids is also not _that_ old, and either current or a single generation behind. (Has its successor landed yet? IIRC GNR is the same generation as Sierra Forest).
V4 cores should be out this year using X925 and C1 Ultra-based V5 will probably be 2027-2028.
I suspect that X4 is already fast enough to beat EPYC in per-core performance when using the whole chip. ARM caught up/passed x86 in IPC all the way back around A77/78 in 2019-2020. They are now much faster per clock and hitting about the same all-core clockspeeds as standard EPYC (let alone zen5c EPYC).
The big issue is that Graviton5 is already starting to hit the market and uses the same v3 cores. A lot of marketshare for this chip will probably come from taking Ampere customers.
However, Neoverse V3 has a lower die area, so you could implement more cores per socket than with Zen 5, but this has not been done yet, as these new CPUs have only 136 cores per socket versus 192 cores per socket for Zen 5.
For programs that do not use array operations, i.e. which do not use AVX/AVX-512 instructions, Neoverse V3 has better performance per watt than Zen 5. But that changes for programs that benefit from AVX/AVX-512, where Zen 5 has better performance per watt.
Moreover, Zen 5 is already old. By the end of the year there will be Zen 6, which will be the real competitor for these new Arm CPUs, and Zen 6 will have better performance per watt, even more cores per socket and even more performance per core.
>Cortex-X4 a.k.a. Neoverse V3 has significantly lower performance per core than Zen 5.
I don't quite believe that, especially per core. In SPECint2017 from David Huang [1], Zen5 (HX 370) @ 5.1 GHz boost = 9.9 points, so Zen5 is approximately 1.94 points per GHz. But Neoverse V3 (Cortex-X4) @ 3.2 GHz = 8.2 points, so V3 is approximately 2.56 points per GHz.
Arm 64C Neoverse V3 boosts to 3.7 GHz. AMD 64C Zen5 (9575F) boosts to 5 GHz. So this rough napkin mouth would show at maximum boost Neoverse V3 is right around maximum boost Zen5.
64C Neoverse V3: 2.56 pts / GHz * 3.7 GHz = 9.47 points
64C Zen5 9575F: 1.94 pts / GHz * 5.0 GHz = 9.70 points (+2.4%)
Zen5 fares much worse at base clocks, with Arm's 64C CPU offering +40% more SPECint perf per core than Zen5 because AMD downclocks to 3.3 GHz, but Arm is still up at 3.5 GHz + huge IPC advantage.
64C Neoverse V3: 2.56 pts / GHz * 3.5 GHz = 8.96 points (+40%)
64C Zen5 9575F: 1.94 pts / GHz * 3.3 GHz = 6.40 points
I remind these are both 64C core parts. I doubt AMD will achieve +40% higher perf / core / W with Zen6, but it'll be exciting if they can do that.
[1] https://blog.hjc.im/spec-cpu-2017
Ampere had previously used cores designed by Arm, but their latest CPUs (which do not impress much) use a custom core, like the Apple and Qualcomm CPUs.