Reflections on 30 years of HPC programming (chapel-lang.org)

by matt_d 123 comments 146 points
Read article View on HN

123 comments

[−] jandrewrogers 28d ago
I can easily explain this, having worked in this space. The new languages don’t actually solve any urgent problems.

How people imagine scalable parallelism works and how it actually works doesn’t have a lot of overlap. The code is often boringly single-threaded because that is optimal for performance.

The single biggest resource limit in most HPC code is memory bandwidth. If you are not addressing this then you are not addressing a real problem for most applications. For better or worse, C++ is really good at optimizing for memory bandwidth. Most of the suggested alternative languages are not.

It is that simple. The new languages address irrelevant problems. It is really difficult to design a language that is more friendly to memory bandwidth than C++. And that is the resource you desperately need to optimize for in most cases.

[−] Joel_Mckay 28d ago

> C++ is really good at optimizing for memory bandwidth

In general, most modern CPU thread-safe code is still a bodge in most languages. If folks are unfortunate enough to encounter inseparable overlapping state sub-problems, than there is no magic pixie dust to escape the computational cost. On average, attempting to parallelize this type of code can end up >30% slower on identical hardware, and a GPU memory copy exchange can make it even worse.

Sometimes even compared to a large multi-core CPU, a pinned-core higher clock-speed chip will win out for those types of problems.

Thus, the mystery why most people revert to batching k copies of single-core-bound non-parallel version of a program was it reduces latency, stalls, cache thrashing, i/o saturation, and interprocess communication costs.

Exchange costs only balloon higher across networks, as however fast the cluster partition claims to be... the physics is still going to impose space-time constraints, as modern data-centers will spend >15% of energy cost just moving stuff around networks for lower efficiency code.

I like languages like Julia, as it implicitly abstracts the broadcast operator to handle which areas may be cleanly unrolled. However, much like Erlang/Elixir the multi-host parallelization is not cleanly implemented... yet...

The core problem with HPC software, has always been academics are best modeled like hermit-crabs with facilities. Once a lucky individual inherits a nice new shell, the pincers come out to all smaller entities who may approach with competing interests.

Best of luck, =3

"Crabs Trade Shells in the Strangest Way | BBC Earth"

https://www.youtube.com/watch?v=f1dnocPQXDQ

[−] bruce343434 28d ago
What does it mean to be friendly to memory bandwidth, and why does C++ excel at it, over, say, Fortran or C or Rust?
[−] iamcreasy 27d ago
Julia language is also used for HPC according to their webpage citing performance parity with C++. Would it be correct to infer Julia also provides the same level of memory bandwidth control?
[−] convolvatron 27d ago
I worked in parallel computing in the late 80s and early 90s when parallel languages were really a thing. in HPC applications memory bandwidth is certainly a concern, although usually the global communications bandwidth (assuming they are different) is the roofline. by saying c++ you're implying that MPI is really sufficient, and its certainly possible to prop up parallel codes with MPI is really quite tiresome and hard to play with the really interesting problem which is the mapping of the domain state across the entire machine.

other hugely important problems that c++ doesn't address are latency hiding, which avoids stalling out your entire core waiting for distributed message, and a related solution which is interleave of computation and communication.

another related problem is that a lot of the very interesting hardware that might exist to do things like RDMA or in-network collective operations or even memory-controller based rich atomics, aren't part of the compiler's view and thus are usually library implementations or really hacky inlines.

is there a good turnkey parallel language? no. is there sufficient commonality in architecture or even a lot of investment in interesting ideas that were abandoned because of cost, no. but there remains a huge potential to exploit parallel hardware with implicit abstractions, and I think saying 'just use c++' is really missing almost all of the picture here.

addendum: even if you are working on a single-die multicore machine, if you don't account for locality, it doesn't matter how good your code generator is, you will saturate the memory network. so locality is an important and languages like Chapel are explicitly trying to provide useful abstractions for you to manage it.

[−] j4k0bfr 28d ago
I'm pretty interested in realtime computing and didn't realise C++ was considered bandwidth efficient! Coming from C, I find myself avoiding most 'new' C++ features because I can't easily figure out how they allocate without grabbing a memory profiler.
[−] suuuuuuuu 28d ago
If you think C++ is the best here, then I don't think you've actually worked in this space nor appreciated the actual problems these languages try to solve. In particular because you can't program accelerators with C++.

Memory bandwidth is often the problem, yes. Language abstractions for performance aim to, e.g., automatically manage caches (that must be handled manually in performant GPU code, for instance) with optimized memory tiling and other strategies. Kernel fusion is another nontrivial example that improves effective bandwidth.

Adding on the diversity of hardware that one needs to target (both within and among vendors), i.e., portability not just of function but of performance, makes the need for better tooling abundantly obvious. C++ isn't even an entrant in this space.

[−] Xcelerate 28d ago
I wonder how much of the programming language problem is due to churn of the user base. Looking over many comments in this thread, I see “Oh, back when I did HPC...” I used Titan for my own work back in 2012. But after my PhD, I never touched HPC again. So the people writing the code use what’s there but don’t stay long enough to help incentivize new or better languages. Now on the hardware side (e.g., design of interconnects), that more commonly seems to be a full career.

The other issue is that to really get the value out of these machines, you sort of have to tailor your code to the machine itself to some degree. The DOE likes to fund projects that really show off the unique capabilities of supercomputers, and if your project could in principle be done on the cloud or a university cluster, it’s likely to be rejected at the proposal stage. So it’s sort of “all or nothing” in the sense that many codebases for HPC are one-off or even have machine-specific adaptations (e.g., see LAMMPS). No new general purpose language would really make this easier.

[−] jpecar 28d ago
All these fancy HPC languages are all nice and dandy, but the hard reality I see on our cluster is that most of the work is done in Python, R and even Perl and awk. MPI barely reached us and people still prefer huge single machines to proper distributed computing. Yeah, bioinformatics is from another planet.
[−] hpcdude 27d ago
HPCdude here, and this is a mostly correct article, but here are what it misses:

1) It mentions in passing the hardware abstraction not being as universal as it seemed. This is more and more true, once we started doing fpgas, then asics, and as ARM and other platforms starting making headway, it fractured things a bit.

GPUs too: I'm still a bit upset about CUDA winning over OpenCL, but Vulkan compute gives me hope. I haven't messed with SYCL but it might be a future possibility too.

2) The real crux is the practical, production interfaces that HPC end users get. Normally I'm not exposing an entire cluster (or sub cluster) to a researcher. I give them predefined tools which handle the computation across nodes for them (SLURM is old but still huge in the HPC space for a reason! When I searched the article for "slurm" I got 0 hits!) When it comes to science, reproducibility is the name of the game, and having more idempotent and repeatable code structures are what help us gain real insights that others can verify or take and run with into new problem/solutions. Ad-hoc HPC programming doesn't do that well, like existing languages slurm and other orchestration layers handle.

Sidenote: One of the biggest advances recently is in RDMA improvements (remote direct memory access), because the RAM needs of these datasets are growing to crazy numbers, and often you have nodes being underutilized that are happy to help. I've only done RoCE myself though and not much with Infiniband, (sorry Yale, thats why I flubbed on the interview) but honestly, I still really like RoCE for cluster side and LACP for front facing ingress/egress.

The point is existing tooling can be massaged and often we don't need new languages. I did some work with Mellanox/Weka prior to them being bought by Nvidia on optimizing the kernel shims for NFSv4 for example. Old tech made fast again.

[−] saltcured 27d ago
I was a student intern in a parallel computing research group around that first reference point of 1995. My career went other ways, working more on distributed systems instead of programming language theory or implementation.

But, when I encountered OpenCL and CUDA about ten years ago, I was struck by just how much these were delivering the SPMD parallel programming model in finished products. Around 1995, these were often C dialects with some wonky compiler that each research group just barely kept together. By 2015, they were just bundled up inside a graphics driver or similarly commoditized runtime environment.

Also, the GPU of 2015 was delivering the throughput we dreamed of in supercomputers back then. A teraFLOP went from a strategic theme to something you could deploy to your desktop.

[−] RhysU 28d ago

> we have failed to broadly adopt any new compiled programming languages for HPC

The article neglects that all of C, C++, and Fortran have evolved over the last 30 years.

Also, you'll find significant advances in the HPC library ecosystem over the trailing years. Consider, for example, Trilinos (https://trilinos.github.io/index.html) or Dakota (https://dakota.sandia.gov/about-dakota/) both of which push a ton of domain-agnostic capabilities into a C++ library instead of bolting them into a bespoke language. Communities of users tend to coalesce around shared libraries not creating new languages.

[−] riffraff 28d ago
Perhaps one issue lacking discussion in the article is how easy it is to find devs?

I've never worked in HPC but it seems it should be relatively simple to find a C/C++ dev that can pick up OpenMP, or one that already knows it, compared to hiring people who know Chapel.

The "scaling down" factor (how easy or interesting it is to use tool X for small use) seems a disadvantage of HPC-only languages, which creates a barrier to entry and a reduction in available workforce.

[−] shevy-java 28d ago

> Could the reason be that language design is dead, as was asserted by an anonymous reviewer on one of our team’s papers ~30 years ago?

It may not be dead, but it seems much harder for languages to gain adoption.

I think there are several reasons; I also suspect AI contributes a bit to this.

People usually specialize in one or two language, so the more languages exist, the less variety we may see with regards to people ACTUALLY using the language. If I would be, say, 15 years old, I may pick up python and just stick with it rather than experiment and try out many languages. Or perhaps not even write software at all, if AI auto-writes most of it anyway.

[−] guywithahat 27d ago
I like the idea of chapel, but I'm not sure I agree with a lot of their design choices. Some of the parallelization features seem like they just copied OpenMP without meaningfully improving on it. They also kept exceptions, which are generally on their way out, especially in compiled languages (Go, Rust, Zig, and while they exist in modern C++ they are introducing more ways to not use them). I think a new HPC language is possible but I'm not sure this is the one
[−] swiftcoder 28d ago
It's interesting that none of the actor-based languages ever made it into this space. Feels like something with the design philosophy of Erlang would be pretty suitable to exploit millions of cores and a variety of interconnects...
[−] pklausler 28d ago
Honestly, if a language can't succeed in HPC alongside (or against) Fortran with its glacial rate of buggy evolution and poor track record of portability, and C++ with its never-ending attempts at parallelism, then it's not what HPC needs.

(What HPC does need, IMNSHO, is to disband or disregard WG5/J3, get people who know what they're doing to fix the features they've botched or neglected for thirty years, and then have new procurements include RFCs that demand the fixed portable Fortran from system integrators rather than the ISO "standard".)

[−] rramadass 27d ago
Relevant:

The Art of High Performance Computing (a comprehensive series of textbooks) - https://theartofhpc.com/

Previous discussion - https://news.ycombinator.com/item?id=38815334

[−] chatmasta 28d ago
HPC is heavily skewed toward academia, and it doesn’t have a lot of overlap with compiler nerds. I think this explains it.
[−] nnevatie 28d ago
Usually a new language is facing the ecosystem mass issue - the previously used language, e.g., C++ has already the critical mass with available libraries and frameworks. Getting to the same level of ecosystem maturity with a new language will take a long time, as seen with Rust.
[−] DamonHD 28d ago
I used to edit an HPC trade rag in the early 90s, so this was an interesting read!
[−] ivell 28d ago
I think Mojo has a good chance to become suitable for HPC.
[−] crabbone 28d ago
As someone who worked for a while and still works in HPC, my impression from this field as compared to eg. programming in finance sector or programming for storage sector is that... HPC is so backwards and far behind, it's really amazing how it's portrayed as some sort of a champion of the field.

That's not to say that new things don't happen there, it's just that I find a lot of old stuff that was shown to be bad decades ago still being in vogue in HPC. Probably because it's a relatively small field with a lot of people there being academics and not a lot of migration to/from other fields.

You've probably never heard of module (either Tcl or Lmod). This is a staple of HPC world. What this thing does is it sources or (tries to) remove some shell variables and functions into the shell used either interactively or by a batch job. This is a beyond atrocious idea to handle your working environment. The information leaks, becomes stale, you often end up loading the wrong thing into your environment. It's simply amazing how bad this thing is. And yet, it's just everywhere in HPC.

Another example: running anything in HPC, basically, means running Slurm batch jobs. There are alternatives, but those are even worse (eg. OpenPBS). When you dig into the configuration of these tools, you realize they've been written for pre-systemd Linux and are held together by a shoestring of shell scripting. They seldom if at all do the right thing when it comes to logging or general integration with the environment they run in. They can be simultaneously on the bleeding edge (eg. cgroup integration or accelerator driver integration) and be completely backwards when it comes to having a sensible service definition for systemd (eg. try to manage their service dependencies on their own instead of relying on systemd to do that for them).

In other words, imagine a steam-punk world, but now it's in software. That's sort of how HPC feels like after a decade or so in more popular programming fields.

Also, a lot of code written for HPC is written the way it is not because the writer chose the language or the environment. The typical setup is: university IT created a cluster with whatever tools they managed to put there eons ago, and you, the code writer, have to deal with... using CentOS6 by authenticating to university's AD... in your browser... through JupyterLab interface. And there's nothing you can do about it because the IT isn't there, is incompetent to the bone and as long as you can get your work done somehow, you'd prefer that over fighting to perfect your toolchain.

Bottom line, unless a language somehow becomes indispensable in this world, no matter its advantages, it's not going to be used because of the huge inertia and general unwillingness to do beyond the minimum.

[−] hpcgroup 28d ago
[dead]
[−] kevinten10 28d ago
[dead]
[−] chinabot 28d ago
There has been a very big adoption of ENGLISH as a programming language in the last year or so, and, painful as it sounds, AI is already generating machine code without compilers, so let's see where we are in 2030.