Python 3.15's JIT is now back on track (fidget-spinner.github.io)

by guidoiaquinti 310 comments 483 points
Read article View on HN

310 comments

[−] mattclarkdotnet 60d ago
Python really needs to take the Typescript approach of "all valid Python4 is valid Python3". And then add value types so we can have int64 etc. And allow object refs to be frozen after instantiation to avoid the indirection tax.

Sensible type-annotated python code could be so much faster if it didn't have to assume everything could change at any time. Most things don't change, and if they do they change on startup (e.g. ORM bindings).

[−] mattclarkdotnet 59d ago
To clarify, it is nuts that in an object method, there is a performance enhancement through caching a member value.

  class SomeClass
    def init(self)
      self.x = 0
    def SomeMethod(self)
      q = self.x
      ## do stuff with q, because otherwise you're dereferencing self.x all the damn time
[−] 1718627440 59d ago
This is not just a performance concern, this describes completely different behaviour. You forgot that self.x is just Class.__getattr__(self, 'x') and that you can implement __getattr__ how you like. There is no object identity across the values returned by __getattr__.
[−] dundarious 59d ago
This level of dynamism is commonly forgotten/omitted because it is most often not at all needed. "There is no object identity across the values [retrieved by self.x]" is a very curious choice to many.
[−] 1718627440 59d ago
It's very Pythonic to expose e.g. state via the existence of attributes. This also makes it possible to dynamically expose foreign language interfaces. You can really craft the interface you like, because the interface exposal is also normal code that returns strings and objects.

You are right that it is not needed often, but there is often somewhere a part in the library stack that does exactly this, to expose a nice interface.

[−] xenadu02 59d ago
This is just an analogy but in Swift String is such a commonly used hot path the type is designed to accommodate different backing representations in a performant way. The type has bits in its layout that indicate the backing storage. eg a constant string is just a pointer to the bytes in the binary and unless the String escapes or mutates incurs no heap allocation at all - it is just a stack allocation and a pointer.

Javascript implementations do their own magic since most objects aren't constantly mutating their prototypes or doing other fun things. They effectively fast-path property accesses and fallback if that assumption proves incorrect.

Couldn't python tag objects that don't need such dynamism (the vast majority) so it can take the fast path on them?

[−] adrian17 55d ago
It already has a fast path, from (I think) 3.11. If you run object.x repeatedly on the same type of object enough times, the interpreter will swap out the LOAD_ATTR opcode to LOAD_ATTR_INSTANCE_VALUE or LOAD_ATTR_SLOT, which only makes sure that the type is the same as before and loads the value from a specified offset, without doing a full lookup.
[−] dekelpilli 59d ago
Java also has a performance cost to accessing class fields, as exampled by this (now-replaced) code in the JDK itself - https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/...
[−] anematode 59d ago
Any decent JIT compiler (and HotSpot's is world class) will optimize this out. Likely this was done very early on in development, or was just to reduce bytecode size to promote inlining heuristics that use it
[−] vanderZwan 59d ago
String is also a pretty damn fundamental object, and I'm sure trim() calls are extremely common too. I wouldn't be surprised if making sure that seemingly small optimizations like this are applied in the interpreter before the JIT kicks are not premature optimizations in that context.

There might be common scenarios where this had a real, significant performance impacts, E.G. use-cases where it's such a bottle-neck in the interpreter that it measurably affects warm-up time. Also, string manipulation seems like the kind of thing you see in small scripts that end before a JIT even kicks in but that are also called very often (although I don't know how many people would reach for Java in that case.

EDIT: also, if you're a commercial entity trying to get people to use your programming language, it's probably a good idea to make the language perform less bad with the most common terrible code. And accidentally quadratic or worse string manipulation involving excessive calls to trim() seems like a very likely scenario in that context.

[−] LtWorf 59d ago
But what if whatever you call is also accessing and changing the attribute?
[−] anematode 59d ago
If what you call gets inlined, then the compiler can see that it either does or doesn't modify the attribute and optimize it accordingly. Even virtual calls can often be inlined via, e.g., class hierarchy analysis and inline caches.

If these analyses don't apply and the callee could do anything, then of course the compiler can't keep the value hoisted. But a function call has to occur anyway, so the hoisted value will be pushed/popped from the stack and you might as well reload it from the object's field anyway, rather than waste a stack slot.

[−] LtWorf 59d ago
Another thread can access it and do that, how could the compiler possibly know about it?
[−] bjourne 58d ago
Untrue. Objects and attributes may be shared by threads so attribute accesses can't in general be localized.
[−] Kamii0909 59d ago
That was a niche optimization primarily targeting code at intepretor. Even the most basic optimizing compiler in HotSpot tiered compilation chain at that time (the client compiler or C1) would be able to optimize that into the register. Since String is such an important class, even small stuffs like this is done.
[−] mathisfun123 59d ago

> it is nuts that in an object method, there is a performance enhancement through caching a member value

i don't understand what you think is nuts about this. it's an interpreted language and the word self is not special in any way (it's just convention - you can call the first param to a method anything you want). so there's no way for the interpreter/compiler/runtime to know you're accessing a field of the class itself (let alone that that field isn't a computed property or something like that).

lots of hottakes that people have (like this one) are rooted in just a fundamental misunderstanding of the language and programming languages in general .

[−] duskdozer 59d ago
You mean even if x is not a property?
[−] stabbles 59d ago
That was how the Mojo language started. And then soon after the hype they said that being a superset of Python was no longer the goal. Probably because being a superset of Python is not a guarantee for performance either.
[−] bloppe 59d ago
But that's just not what python is for. Move your performance-critical logic into a native module.
[−] fermigier 59d ago
I have made some experiments with P2W, my experimental Python (subset) to WASM compiler. Initial figures are encouraging (5x speedup, on specific programs).

https://github.com/abilian/p2w

NB: some preliminary results:

  p2w is 4.03x SLOWER than gcc (geometric mean)

  p2w is 5.50x FASTER than cpython (geometric mean)

  p2w is 1.24x FASTER than pypy (geometric mean)
[−] BerislavLopac 59d ago

> python code could be so much faster if it didn't have to assume everything could change at any time

Definitely, but then it wouldn't be Python. One of the core principles of Python's design is to be extremely dynamic, and that anything can change at any time.

There are many other, pretty good, strictly dynamically typed languages which work just as well if not better than Python, for many purposes.

[−] wolvesechoes 59d ago

> Python really needs to take the Typescript approach of "all valid Python4 is valid Python3"

It is called type hints, and is already there. TS typing doesn't bring any perf benefits over plain JS.

[−] phkahler 59d ago

>> Sensible type-annotated python code could be so much faster if it didn't have to assume everything could change at any time.

Then it wouldn't be Python any more.

[−] panzi 59d ago
Isn't rpython doing that, allowing changes on startup and then it's basically statically typed? Does it still exist? Was it ever production ready? I only once read a paper about it decades ago.
[−] rich_sasha 59d ago
I think sadly a lot of Python in the wild relies heavily, somewhere, on the crazy unoptimisable stuff. For example pytest monkey patches everything everywhere all the time.

You could make this clean break and call it Python 4 but frankly I fear it won't be Python anymore.

[−] dobremeno 59d ago
SPy [1] is a new attempt at something like this.

TL;DR: SPy is a variant of Python specifically designed to be statically compilable while retaining a lot of the "useful" dynamic parts of Python.

The effort is led by Antonio Cuni, Principal Software Engineer at Anaconda. Still very early days but it seems promising to me.

[1] https://github.com/spylang/spy

[−] musicale 59d ago

> Python really needs to take the Typescript approach of "all valid Python4 is valid Python3

Great idea, but I'm not convinced that they learned anything from the Python 2 to 3 transition, so I wouldn't hold my breath.

If you want a language system without contempt for backward compatibility, you're probably better off with Java/C++/JavaScript/etc. (though using JS libraries is like building on quicksand.) Bit of a shame since I want to like Python/Rust/Swift/other modern-ish languages, but it turns out that formal language specifications were actually a pretty good idea. API stability is another.

[−] BiteCode_dev 59d ago
There will be not Python 4, and 3.X policy requires forward compat, so we are already there.
[−] mattclarkdotnet 59d ago
Oh, and while we're at it, fix the "empty array is instantiated at parse time so all your functions with a default empty array argument share the same object" bullshit.
[−] giancarlostoro 59d ago
I went sort of this route in an experiment with Claude.. I really want Python for .NET but I said, damn the expense, prioritize .NET compatibility, remove anything that isn't supported feasably. It means 0 python libs, but all of NuGet is supported. The rules are all signatures need types, and if you declare a type, it is that type, no exceptions, just like in C# (if you squint when looking at var in a funny way). I wound up with reasonable results, just a huge trade of the entire Python ecosystem for .NET with an insanely Python esque syntax.

Still churning on it, will probably publish it and do a proper blog post once I've built something interesting with the language itself.

[−] adrian17 60d ago
I'm been occasionally glancing at PR/issue tracker to keep up to date with things happening with the JIT, but I've never seen where the high level discussions were happening; the issues and PRs always jumped right to the gritty details. Is there anywhere a high-level introduction/example of how trace projection vs recording work and differ? Googling for the terms often returns CPython issue tracker as the first result, and repo's jit.md is relatively barebones and rarely updated :(

Similarly, I don't entirely understand refcount elimination; I've seen the codegen difference, but since the codegen happens at build time, does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs? With so many opcodes and their specialized variants, how many stencils are there now?

[−] owaislone 60d ago
Oh man, Python 2 > 3 was such a massive shift. Took almost half a decade if not more and yet it mainly changing superficial syntax stuff. They should have allowed ABIs to break and get these internal things done. Probably came up with a new, tighter API for integrating with other lower level languages so going forward Python internals can be changed more freely without breaking everything.
[−] rslashuser 60d ago
I'm curious is the JIT developers could mention any Python features that prevent promising JIT features. An earlier Ken Jin blog [1], mentions how __del__ complicates reference counting optimization.

There is a story that Python is harder to optimize than, say, Typescript, with Python flexibility and the C API getting mentioned. Maybe, if the list of troublesome Python features was out there, programmers could know to avoid those features with the promise of activating the JIT when it can prove the feature is not in use. This could provide a way out of the current Python hard-to-JIT trap. It's just a gist of an idea, but certainly an interesting first step would be to hear from the JIT people which Python features they find troublesome.

[1] https://fidget-spinner.github.io/posts/faster-jit-plan.html

[−] vanderZwan 60d ago

>

However, I misunderstood and came up with an even more extreme version: instead of tracing versions of normal instructions, I had only one instruction responsible for tracing, and all instructions in the second table point to that. Yes I know this part is confusing, I’ll hopefully try to explain better one day. This turned out to be a really really good choice. I found that the initial dual table approach was so much slower due to a doubling of the size of the interpreter, causing huge compiled code bloat, and naturally a slowdown.

> By using only a single instruction and two tables, we only increase the interpreter by a size of 1 instruction, and also keep the base interpreter ultra fast. I affectionally call this mechanism dual dispatch.

I really do hope they'll write that better explanation one day because this sounds pretty intriguing all on its own.

[−] oystersareyum 60d ago

> We don’t have proper free-threading support yet, but we’re aiming for that in 3.15/3.16. The JIT is now back on track.

I recently read an interview about implementing free-threading and getting modifications through the ecosystem to really enable it: https://alexalejandre.com/programming/interview-with-ngoldba...

The guy said he hopes the free-threaded build'll be the only one in "3.16 or 3.17", I wonder if that should apply to the JIT too or how the JIT and interpreter interact.

[−] ekjhgkejhgk 60d ago
Doesn't PyPy already have a jit compiler? Why aren't we using that?
[−] pjmlp 59d ago
Great to see this going, Python also deserves a JIT, and given that only few bother with PyPy or GraalPy, shipping into the CPYthon is the only way to have less "rewrite into XYZ".

Kudos to those involved into making it happen.

[−] ghm2199 60d ago
Thanks for all the amazing work! I have Noob question. Wouldn't this get the funding back? Or would that not be preferable way to continue(as opposed to just volunteer driven)?

Like this is a big deal to get a project to a state where volunteers are spun up and actively breaking tasks and getting work done, no? It's a python JIT something I know next to nothing about — as do most application developers — which tells one how difficult this must have been.

[−] ecshafer 60d ago
What is wrong with the Python code base that makes this so much harder to implement than seemingly all other code bases? Ruby, PHP, JS. They all seemed to add JITs in significantly less time. A Python JIT has been asked for for like 2 decades at this point.
[−] fluidcruft 60d ago
(what are blueberry, ripley, jones and prometheus?)
[−] thunky 60d ago
I always wanted this for Python but now that machines write code instead of humans I feel like languages like Python will not be needed as much anymore. They're made for humans, not machines. If a machine is going to do the dirty work I want it to produce something lean, fast, and strictly verified.
[−] a3w 59d ago
Over 100% speedup sound like "the code compiled before you asked the compiler to start working".

from future import time_travel

[−] killingtime74 60d ago
Sorry but the graphs are completely unreadable. There are four code names for each of the lines. Which is jit and which is cpython?
[−] aplomb1026 60d ago
[dead]
[−] devnotes77 60d ago
[dead]
[−] qy-mj 59d ago
[flagged]
[−] openclaw01 60d ago
[dead]
[−] wei03288 60d ago
[flagged]
[−] fivedicks 60d ago
[flagged]
[−] AgentMarket 60d ago
[flagged]
[−] rafph 60d ago
[flagged]