Looking at Unity made me understand the point of C++ coroutines (mropert.github.io)

by ingve 180 comments 179 points
Read article View on HN

180 comments

[−] Joker_vD 52d ago
Simon Tatham, author of Putty, has quite a detailed blog post [0] on using the C++20's coroutine system. And yep, it's a lot to do on your own, C++26 really ought to give us some pre-built templates/patterns/scaffolds.

[0] https://web.archive.org/web/20260105235513/https://www.chiar...

[−] zozbot234 52d ago
People love to complain about Rust async-await being too complicated, but somehow C++ manages to be even worse. C++ never disappoints!
[−] jandrewrogers 51d ago
I find C++ coroutines to be well-designed. Most of the complexity is intrinsic because it tries to be un-opinionated. It allows precise control and customization of almost every conceivable coroutine behavior while still adhering to the principle of zero-cost abstractions.

Most people would prefer opinionated libraries that allow them to not think about the design tradeoffs. The core implementation is targeted at efficient creation of opinionated abstractions rather than providing one. This is the right choice. Every opinionated abstraction is going to be poor for some applications.

[−] fooker 51d ago
C++ standards follow a tick-tock schedule for complex features.

For the tick, the core language gets an un-opinionated iteration of the feature that is meant for compiler developers and library writers to play with. (This is why we sometimes see production compilers lagging behind in features).

For the tock, we try to get the standard library improved with these features to a realistic extent, and also fix wrinkles in the primary idea.

This avoids the standard library having to rely on any compiler magic (languages like swift are notorious for this), so in practice all libraries can leverage the language to the same extend.

This pattern has been broken in a few instances (std::initializer_list), and those have been widely considered to have been missteps.

[−] throwaway17_17 51d ago
Regarding your mention of compiler magic and Swift, I don’t know much about the language, but I have read a handful of discussions/blogs about the compiler and the techniques used for its implementation. One of the purported benefits/points of pride for Swift that stood out to me and I still remember was something to the effect of Swift being fundamentally against features/abstractions/‘things’ being built in. In particular they claimed the example of Swift not having any literal types (ints, sized ints, bools, etc) “built in” to the compiler but were defined in the language.

I don’t doubt your point (I know enough about Swift’s generic resolution crapshow during semantic analysis to be justified in assuming the worst) but can you think of any areas worth looking into for expansion of the compiler magic issues.

I have a near reflexive revulsion for the kinds of non-composability and destruction of principled, theoretically sound language design that tends to come from compiler magic and shortcuts, so always looking for more reading to enrage myself.

[−] fooker 51d ago

> literal types (ints, sized ints, bools, etc) “built in” to the compiler but were defined in the language.

This is actually a good example by itself.

Int is defined in swift with Builtin.int64 IIRC. That is not part of the swift language.

[−] throwaway17_17 51d ago
I don’t know if the language is yours, but I think the wording and its intended meaning (the sentence starting with ‘The core implementation…’) may be one of the most concise statements of my personal programming language design ethos. I’m jealous that I didn’t come up with it. I will certainly credit you when I steal it for my WIP language.

I will be adding the following to my “Primary Design Criteria” list: The core design and implementation of any language feature is explicitly targeted at the efficient creation of opinionated, composable abstractions rather than providing those abstractions at the language level.

[−] 01HNNWZ0MV43FF 51d ago
async is simply a difficult problem, and I think we'll find irreducible complexity there. Sometimes you are just doing 2 or 3 things at once and you need a hand-written state machine with good unit tests around it. Sometimes you can't just glue 3 happy paths together into CSP and call it a day.
[−] quietbritishjim 51d ago
Using structured concurrency [1] as introduced in Python Trio [2] genuinely does help write much simpler concurrent code.

Also, as noted in that Simon Tatham article, Python makes choices at the language level that you have to fuss over yourself in C++. Given how different Trio is from asyncio (the async library in Python's standard library), it seems to me that making some of those basic choices wasn't actually that restrictive, so I'd guess that a lot of C++'s async complexity isn't that necessary for the problem.

[1] https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

[2] https://trio.readthedocs.io/en/stable/

[−] throwaway17_17 51d ago
After so wrote the comment below I realized that it really is just ‘um, actually…’ about discussing using concurrency vs implementing it. It’s probably not needed, but I do like my wording so I’m posting it for personal posterity.

In the context of an article about C++’s coroutines for building concurrency I think structured concurrency is out of scope. Structured concurrency is an effective and, reasonably, efficient idiom for handling a substantial percentage of concurrent workloads (which in light of your parent’s comment is probably why you brought up structured concurrency as a solution); however, C++ coroutines are pitched several levels of abstraction below where structured concurrency is implemented.

Additionally, there is the implementation requirements to have Trio style structured concurrency function. I’m almost certain a garbage collector is not required so that probably isn’t an issue, but, the implementation of the nurseries and the associated memory management required are independent implementations that C++ will almost certainly never impose as a base requirement to have concurrency. There are also some pretty effective cancelation strategies presumed in Trio which would also have to be positioned as requirements.

Not really a critique on the idiom, but I think it’s worth mentioning that a higher level solution is not always applicable given a lower level language feature’s expected usage. Particularly where implementing concurrency, as in the C++ coroutines, versus using concurrency, as in Trio.

[−] maleldil 51d ago
Python's stdlib now supports structured concurrency via task groups[1], inspired by Trio's nurseries[2].

[1] https://docs.python.org/3/library/asyncio-task.html#id6

[2] https://github.com/python/cpython/issues/90908

[−] quietbritishjim 51d ago
Good point. I did carefully say that Trio "introduced" structured concurrency, partly due to this (and also other languages that now use it e.g. Swift, Kotlin).

I will say that it's still not as nice as using Trio. Partly that's because it has edge-triggered cancellation (calling task.cancel() injects a single cancellation exception) rather than Trio's level-triggered cancellation (once a scope is cancelled, including the scope implicit in a nursery, it stays cancelled so future async calls all throw Cancelled unless shielded). The interaction between asyncio TaskGroup and its older task API is also really awkward (how do I update the task's cancelled count if an unrelated task I'm waiting on throws Cancelled?). But it's a huge improvement if you're forced to use asyncio.

[−] jujube3 51d ago
It's quite simple in Golang.
[−] menaerus 51d ago
Golang has a GC and that makes a lot of things easier.
[−] rafram 51d ago
Languages like Swift do manage to make it much simpler. The culture guiding Rust design pretty clearly treats complexity as a goal.
[−] CyberDildonics 51d ago
C++ is great, coroutines are not. Neither of these are good ways to handle concurrency. You really need a more generalized graph and to minimize threads and context switching. You can't do more than the number of logical cores on a CPU anyway.
[−] pjmlp 51d ago
Not really, because due to C++'s unsafe first approach, means that workarounds like Pin aren't required.

Additionally, for those with .NET background, C++ co-routines are pretty much inspired by how they work in .NET/C#, naturally with the added hurdle there isn't a GC, and there is some memory management to take into account.

Also so even if it takes some time across ISO working processes, there is still a goal to have some capabilities on the standard library, that in Rust's case means "use tokio" instead.

[−] matt_d 51d ago
See also C++ coroutines resources (posts, research, software, talks): https://gist.github.com/MattPD/9b55db49537a90545a90447392ad3...
[−] ZoomZoomZoom 51d ago
For a layperson it's clear that it's either "Writings" and "Talks", or "Readings" and "'Listenings", but CPP profeciency is in an inverse relation with being apt in taxonomy, it looks like.

Thanks for the list.

[−] nananana9 52d ago
You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly. It's a matter of saving a few registers and switching the stack pointer, minicoro [1] is a pretty good C library that does it. I like this model a lot more than C++20 coroutines:

1. C++20 coros are stackless, in the general case every async "function call" heap allocates.

2. If you do your own stackful coroutines, every function can suspend/resume, you don't have to deal with colored functions.

3. (opinion) C++20 coros are very tasteless and "C++-design-commitee pilled". They're very hard to understand, implement, require the STL, they're very heavy in debug builds and you'll end up with template hell to do something as simple as Promise.all

[1] https://github.com/edubart/minicoro

[−] pjc50 52d ago

> You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly

I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.

(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)

[−] giancarlostoro 52d ago

> Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon,

They are actively working on it for their VS2026 C++ compiler. I think since 2017 or so they've kept up with C++ standards reasonably? I'm not a heavy C++ guy, so maybe I'm wrong, but my understanding is they match the standards.

[−] manwe150 52d ago
Boost has stackful coroutines. They also used to be in posix (makecontext).
[−] audidude 52d ago

> I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C.

These days on Linux/BSD/Solaris/macOS you can use makecontext()/swapcontext() from ucontext.h and it will turn out roughly the same performance on important architectures as what everyone used to do with custom assembly. And you already have fiber functions as part of the Windows API to trampoline.

I had to support a number of architectures in libdex for Debian. This is GNOME code of course, which isn't everyone's cup of C. (It also supports BSDs/Linux/macOS/Solaris/Windows).

* https://packages.debian.org/sid/libdex-1-1

* https://gitlab.gnome.org/GNOME/libdex

[−] gpderetta 51d ago
Unfortunately swap context requires saving and restoring the signal mask, which, at least on Linux, requires a syscall so it is going to be at least a hundred times slower than an hand rolled implementation.

Also, although not likely to be removed anytime soon from existing systems, POSIX has declared the context API obsolescent a while ago (it might actually no longer be part of the standard).

[−] ndiddy 52d ago
Looking at the repo, it falls back to Windows fibers on Windows/ARM. If you'd like a coroutine with more backends, I'm a fan of libco: https://github.com/higan-emu/libco/ which has assembly backends for x86, amd64, ppc, ppc-64, arm, and arm64 (and falls back to setjmp on POSIX platforms and fibers on Windows). Obviously the real solution would be for the C or C++ committees to add stackful coroutines to the standard, but unless that happens I would rather give up support for hppa or alpha or 8-bit AVR or whatever than not be able to use stackful corountines.
[−] blacklion 52d ago
There is no "Linux/ARM[64]". But there are "Raspberry Pi" and "RISC-V". I don't know such OSes, to be honest :-)

This support table is complete mess. And saying "most platforms are supported" is too optimistic or even cocky.

[−] Joker_vD 52d ago
Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly. But maybe not.

That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.

[−] lelanthran 52d ago

> Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly.

Actually you don't even need setjmp/longjmp. I've used a library (embedded environment) called protothreads (plain C) that abused the preprocessor to implement stackful coroutines.

(Defined a macro that used the __LINE__ macro coupled with another macro that used a switch statement to ensure that calling the function again made it resume from where the last YIELD macro was encountered)

[−] Sharlin 52d ago
C++ destructors and exception safety will likely wreak havoc with any "simple" assembly/longjmp-based solution, unless severely constraining what types you can use within the coroutines.
[−] TuxSH 51d ago

> every async "function call" heap allocates.

> require the STL

That it has to heap-allocate if non-inlined is a misconception. This is only the default behavior.

One can define:

void *operator new(size_t sz, Foo &foo)

in the coro's promise type, and this:

- removes the implicitly-defined operator new

- forces the coro's signature to be CoroType f(Foo &foo), and forwards arguments to the "operator new" one defined

Therefore, it's pretty trivial to support coroutines even when heap cannot be used, especially in the non-recursive case.

Yes, green threads ("stackful coroutines") are more straightforward to use, however:

- they can't be arbitrarily destroyed when suspended (this would require stack unwinding support and/or active support from the green thread runtime)

- they are very ABI dependent. Among the "few registers" one has to save FPU registers. Which, in the case of older Arm architectures, and codegen options similar to -mgeneral-regs-only (for code that runs "below" userspace). Said FPU registers also take a lot of space in the stack frame, too

Really, stackless coros are just FSM generators (which is obvious if one looks at disasm)

[−] cherryteastain 52d ago
Not an expert in game development, but I'd say the issue with C++ coroutines (and 'colored' async functions in general) is that the whole call stack must be written to support that. From a practical perspective, that must in turn be backed by a multithreaded event loop to be useful, which is very difficult to write performantly and correctly. Hence, most people end up using coroutines with something like boost::asio, but you can do that only if your repo allows a 'kitchen sink' library like Boost in the first place.
[−] abcde666777 52d ago
More broadly the dimension of time is always a problem in gamedev, where you're partially inching everything forward each frame and having to keep it all coherent across them.

It can easily and often does lead to messy rube goldberg machines.

There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.

[−] twoodfin 52d ago
As the author lays out, the thing that made coroutines click for me was the isomorphism with state machine-driven control flow.

That’s similar to most of what makes C++ tick: There’s no deep magic, it’s “just” type-checked syntactic sugar for code patterns you could already implement in C.

(Occurs to me that the exceptions to this … like exceptions, overloads, and context-dependent lookup … are where C++ has struggled to manage its own complexity.)

[−] nottorp 51d ago

> turns it into some sort of ugly state machine

Why are people afraid of state machines? There's been sooo much effort spent on hiding them from the programmer...

[−] BSTRhino 51d ago
This is one reason why I built coroutines into my game programming language Easel (https://easel.games). I think they let you keep the flow of the code matching the flow of the your logic (top-to-bottom), rather than jumping around, and so I think they are a great tool for high-level programming. The main thing is stopping the coroutines when the entity dies, and in Easel that is done by implying ownership from the context they are created in. It is quite a cool way of coding I think, avoids the state machines like the OP stated, keeps everything straightforward step-by-step and so all the code feels more natural in my opinion. In Easel they are called behaviors if anyone is interested in more detail: https://easel.games/docs/learn/language/behaviors
[−] nice_byte 51d ago
I don't know, I'm not convinced with this argument.

The "ugly" version with the switch seems much preferable to me. It's simple, works, has way less moving parts and does not require complex machinery to be built into the language. I'm open to being convinced otherwise but as it stands I'm not seeing any horrible problems with it.

[−] wiseowise 52d ago
Looking at C++ made me understand the point of Rust.
[−] pjc50 52d ago
Always jarring to see how Unity is stuck on an ancient version of C#. The use of IEnumerable as a "generator" mechanic is quite a good hack though.
[−] bullen 52d ago
Coroutines generally imply some sort of magic to me.

I would just go straight to tbb and concurrent_unordered_map!

The challenge of parallelism does not come from how to make things parallel, but how you share memory:

How you avoid cache misses, make sure threads don't trample each other and design the higher level abstraction so that all layers can benefit from the performance without suffering turnaround problems.

My challenge right now is how do I make the JVM fast on native memory:

1) Rewrite my own JVM. 2) Use the buffer and offset structure Oracle still has but has deprecated and is encouraging people to not use.

We need Java/C# (already has it but is terrible to write native/VM code for?) with bottlenecks at native performance and one way or the other somebody is going to have to write it?

[−] appstorelottery 51d ago
I've been doing a lot of work with ECS/Dots recently and once I wrapped my head around it - amazing.

I recall working on a few VR projects - where it's imperative that you keep that framerate solid or risk making the user physically sick - this is where really began using coroutines for instantiating large volumes of objects and so on (and avoiding framerate stutter).

ECS/Dots & the burst compiler makes all of this unnecessary and the performance is nothing short of incredible.

[−] pjmlp 52d ago
As I mentioned on the Reddit thread,

This is quite understandable when you know the history behind how C++ coroutines came to be.

They were initially proposed by Microsoft, based on a C++/CX extension, that was inspired by .NET async/await implementation, as the WinRT runtime was designed to only support asynchronous code.

Thus if one knows how the .NET compiler and runtime magic works, including custom awaitable types, there will be some common bridges to how C++ co-routines ended up looking like.

[−] mgaunard 52d ago
Coroutines is just a way to write continuations in an imperative style and with more overhead.

I never understood the value. Just use lambdas/callbacks.

[−] bradrn 52d ago
In Haskell this technique has been called ‘reinversion of control’: http://blog.sigfpe.com/2011/10/quick-and-dirty-reinversion-o...
[−] tliltocatl 51d ago
Stackless coroutines in C when? As an embedded dev, I miss them deeply. Certainly not enough RAM to give a separate stack for everything and rewriting every async call as a callback sequence sucks.
[−] sagebird 52d ago

>> To misquote Kennedy, “we chose to focus coroutines on generator in C++23, not because it is hard, but because it is easy”.

Appreciate this humor -- absurd, tasteful.

[−] Animats 51d ago
Most game engines seem to have some coroutine kludge.
[−] djmips 51d ago
The 'primitive' SCUMM language used for writing Adventure Games like Maniac Mansion had coroutines - an ill fated attempt to convert to using Python was hampered by Python (at the time) having no support for yield.
[−] maltyxxx 52d ago
[flagged]
[−] rando-guy 52d ago
[dead]
[−] sta1n 51d ago
[dead]
[−] momocowcow 52d ago
No serious devs even uses Unity coroutines. Terrible control flow and perf. Fine for small projects on PC.
[−] FpUser 52d ago
I do not find so called "green threads" useful at all. In my opinion except some very esoteric cases they serve no purpose in "native" languages that have full access to all OS threading and IO facilities. Useful only in "deficient" environments like inherently single threaded request handlers like NodeJS.