The acyclic e-graph: Cranelift's mid-end optimizer (cfallin.org)

by tekknolagi 22 comments 74 points
Read article View on HN

22 comments

[−] mbid 30d ago
I believe these ideas are much more mature and better explored for code gen, but similar techniques are useful also in the frontend of compilers, in the type checker. There's a blog post [1] by Niko Matsakis where he writes about adding equalities to Datalog so that Rust's trait solver can be encoded in Datalog. Instead of desugaring equality into a special binary predicate to normal Datalog as Niko suggests, it can be also be implemented by keeping track of equality with union-find and then propagating equality through relations, eliminating now-duplicate rows recursively. The resulting system generalizes both Datalog and e-graphs, since the functionality axiom ("if f(x) = y and f(x) = z, then y = z") is a Datalog rule with equality if you phrase it in terms of the graph of functions.

Systems implementing this are egglog [2] (related to egg mentioned in the article) and (self-plug, I'm the author) eqlog [3]. I've written about implementing Hindley-Milner type systems here: [4]. But I suspect that Datalog-based static analysis tools like CodeQL would also benefit from equalities/e-graphs.

[1] https://smallcultfollowing.com/babysteps/blog/2017/01/26/low...

[2] https://github.com/egraphs-good/egglog

[3] https://github.com/eqlog/eqlog

[4] https://www.mbid.me/posts/type-checking-with-eqlog-polymorph...

[−] pjmlp 31d ago

> While that kind of flexibility is tempting, it comes with a significant complexity tax as well: it means that reasoning through and implementing classical compiler analyses and transforms is more difficult, at least for existing compiler engineers with their experience, because the IR is so different from the classical data structure (CFG of basic blocks). The V8 team wrote about this difficulty recently as support for their decision to migrate away from a pure Sea-of-Nodes representation.

Note that the Sea of Nodes author, Cliff Click, is the opinion they weren't really using the way they should, and naturally doesn't see a point on their migration decision.

There is a Coffee Compiler Club discussion on the subject.

[−] titzer 31d ago
Well it's hard to summarize what I said in the Coffee Compiler club chat in a HN comment, but there were a number of things that went wrong there. I half agree with Cliff and half agree with the V8 blogpost. TurboFan evolved into a very complicated compiler that made a number of things harder on itself that it should have been.

The sea of nodes is just extending SSA renaming on values to both control and effects. Effect dependencies are equivalent SSA renaming of the state of the world, allowing relaxed ordering of effectful operations and more general transforms. That means that GVN and load elimination are the same thing when effect dependencies are explicitly part of the graph.

Making control and effect dependencies explicit is great!

What makes the sea of nodes complicated is relaxing linear control and effects to allow more reorderings. Many optimizations require a more general algorithm (which is sometimes inefficient, but mostly not) and other optimizations can be almost impossible. E.g. reasoning about what happens between two instructions is impossible--there is no such thing, except after scheduling. For most optimizations, the chain of dependencies is enough. Not all. Loop transforms become more complicated, making regions of code that are uninterruptible (e.g. fully initializing an object before it can be see by the GC) is tough, and a few other things.

Overall I would say that TurboFan's main problem was that did not relax effect edges and it tried to introduce speculation too late and tried to that in the sea of nodes representation. It would have been a better design to do some optimizations on a CFG representation prior to the heavy lifting in optimizations that work on the sea of nodes.

One of TurboFan's good architectural decisions was to separate operators from the node representation, so that reasoning could be somewhat independent of how nodes represent dataflow and effects, but it looks like that got junked in favor of the class-based organization (https://github.com/v8/v8/blob/main/src/maglev/maglev-ir.h) which is pure 90s tech lifted straight from C1 and Crankshaft. When I see an IR that's 11K lines in a header, I find it astonishing. Pity, that 11K knot isn't just self-contained, it will replicate itself over and over and over in the compiler and make a big mess in the end.

I think the main part of the V8 blogpost I agree with is that the sea of nodes is difficult to debug, especially for big graphs. I don't see any way around that except a whole crapton of testing, better tools, graph verifiers, etc. There's a learning curve to any compiler, and complex compilers have complex failure modes. Still, I think some people on the V8 team just always hated the sea of nodes and blamed all of their problems on it. It didn't help that all of the senior people who developed expertise with the IR moved on.

[−] pizlonator 31d ago
Compiler writer here.

This post makes it seem like the pass ordering problem is bigger than it really is and then overestimates the extent to which egraphs solve it.

The pass ordering problem isn’t a big deal except maybe in the interaction of GVN and load elimination, but practically, that ends up being a non issue because its natural to make those be the same pass.

Aside from that, pass ordering isn’t a source of sweat for me or my colleagues. Picking the right pass order is fun and easy compared to the real work (designing the IR and writing the passes).

When pass ordering does come to bite you, it’s in a way that egraphs won’t address:

- You need to run some pass over a higher level IR, some other pass over a lower level IR (ie later), and then you discover that the higher level pass loses information needed by the lower level one. That sucks, but egraphs won’t help.

- You might have some super fancy escape analysis and some super fancy type inference that ought to be able to help each other but can only do so to a limited extent because they’re expensive to run repeatedly to fixpoint and even then they can’t achieve optimality. Both of them are abstract interpreters from hell. Too bad so sad - your only full solution is creating an even more hellish abstract interpreter. Egraphs won’t help you.

- Your tail duplication reveals loops so you want to run it before loop optimization. But your tail duplication also destroys loops so you want to run it after loop optimization. No way around this if you want to do fully aggressive taildup. I end up just running it late. I don’t think egraphs will help you here either.

Worth noting that the first problem is sort of theoretical to me, in the sense that I’ve always found some ordering that just works. The second problem happens, but when it does happen, it’s reasonable to have high level and low level versions of the optimizations and just do both (like how the FTL does CSE in three IRs). The last problem is something I still think about.

[−] j2kun 31d ago
I work in an esoteric compiler domain (compilers for fancy cryptography) and we've been eyeing e-graphs for a bit. This article is super helpful seeing how it materialized in a real-world scenario.

An interesting move in this direction is the Tamagoyaki project: https://github.com/jumerckx/Tamagoyaki that supports equality saturation directly in MLIR.

[−] IainIreland 30d ago
This is really cool. Thanks for the write-up, Chris!

I kept waiting for "sea of nodes with CFG" to be shortened to SeaFG, and it never happened. I guess maybe it's ambiguous out loud.

[−] 0xnadr 31d ago
The e-graph approach to optimization is really elegant. Curious how much compile-time overhead it adds vs the optimization wins it gets.
[−] infogulch 30d ago
I remember coming across egg ( https://egraphs-good.github.io/ ) via some virtual conference a few years ago. I'm happy to see this idea land in a real compiler!
[−] PoignardAzur 33d ago

>

Finally, the most interesting question in my view: [...] does skipping equality saturation take the egraph goodness out of an egraph(-alike)? The most surprising conclusion in all of the data was, for me, that aegraphs (per se) -- multi-value representations -- don't seem to matter.

I'm not super surprised.

As the article points out, a lot of e-graph projects include rules for culling e-nodes or stopping generation after a certain cutoff. That this is considered a perfectly normal thing to do hints that equality saturation isn't really the magic sauce of e-graphs.

[−] quapster 31d ago
[dead]