Wow, quite surprising results. I have been working on a personal project with the astral stack (uv, ruff, ty) that's using extremely strict lint/type checking settings, you could call it an experiment in setting up a python codebase to work well with AI. I was not aware that ty's gaps were significant. I just tried with zuban + pyright. Both catch a half dozen issues that ty is ignoring. Zuban has one FP and one FN, pyright is 100% correct.
Looks like I will be converting to pyright. No disrespect to the astral team, I think they have been pretty careful to note that ty is still in early days. I'm sure I will return to it at some point - uv and ruff are excellent.
This is the way. For now, pyright it's also 100% pyright for me. I can recommend turning on reportMatchNotExhaustive if you're into Python's match statements but would love the exhaustiveness check you get in Rust. Eric Traut has done a marvellous job working on pyright, what a legend!
But don't get me wrong, I made an entry in my calendar to remind me of checking out ty in half a year. I'm quite optimistic they will get there.
Microsoft started as a programming language company (MS-BASIC) and they never stopped delivering serious quality software there. VB (classic), for all its flaws, was an amazing RAD dev product. .NET, especially since the move to open-source, is a great platform to work with. C# and TS are very well-designed languages.
Though they still haven't managed to produce a UI toolkit that is both reliable, fast, and easy to use.
is correct according to PEP 484 (when an argument is annotated as having type float, an argument of type int is acceptable) but this will lead to a runtime error.
mypy sees no type error here, but ty does.
I've been using ty on some previously untyped codebases at work. It does a good job of being fast and easy to use while catching many issues without being overly draconian.
My teammates who were writing untyped Python previously don't seem to mind it. It's a good addition to the ecosystem!
My understand is Astral's focus for ty has been on making a good experience for common issues, whereas they plan for very high compliance but difficult or rare edge cases aren't are prioritized.
Compliance suite numbers are biased towards edge cases and not the common path because that's where a lot of the tests need to be added.
My advise is to see how each type checker runs against your own codebase and if the output/performance is something you are happy with.
> My understand is Astral's focus for ty has been on making a good experience for common issues, whereas they plan for very high compliance but difficult or rare edge cases aren't are prioritized.
I would say that's true in terms of prioritization (there's a lot to do!), but not in terms of the final user experience that we are aiming for. We're not planning on punting on anything in the conformance suite, for instance.
I've used mypy forever and never even tried these others. Looking at them though it looks like it's worth trying out Zuban or Pyright? Is there a noticeable benefit when switching between different checkers?
Are there any good static (i.e. not runtime) type checkers for arrays and tensors? E.g. "16x64x256 fp16" in numpy, pytorch, jax, cupy, or whatever framework. Would be pretty useful for ML work.
Just an FYI, for people looking at the low pass rates for mypy and ty and concluding they must not be very useful. These test suites are checking many odd corners of the typing spec.
For "normal" Python code, I find mypy does pretty good. Certainly I find it helpful, especially on a large code base and when working with other developers of various experience levels.
The reason I prefer pyrefly over mypy is mostly because of speed. Better accuracy is nice but speed it the killer feature. Given the quality of uv and ruff and the experience of the team working on ty, I'm quite confident it's going to be great in that respect as well.
Using VSCodium I was having issues with python type checkers for quite a while. I did the basedpyright thing for a while but that was painful. It's a bit too based for me, and I'm not sure i'd call it based. Right now I have uv, ruff, and ty and I'm happy with it. It's super easy to update and super fast. I didn't realize the coverage wasn't as good as some others but I still like it. I may have to try pyrefly. Never heard of it until this post, so thank you.
This is great and I'll try out pyright ASAP on my current codebase. The people who wrote it evidently didn't have any type checking running (despite I think 3+ linters??) so it's a nightmare of
> "well the checker accurately reports it will be type X in an error case not Y"
> "but we never get type X"
> "Then we don't have good enough coverage"
It's so easy in vscode, but it isn't on by default like the c/c++ one I guess because too much legacy code would cause infinite errors. And the age old problem of .pyi files lying about types.
Interesting. This is the first I've heard of Zuban.
The fact that Mypy fails so badly matches my experience. It would be interesting to see exactly where Pyright "fails". It's been so reliable to me I wouldn't be 100% surprised if these are deliberate deviations from the spec, where it is dumb.
I still can't get over the utter idiocy in Python's type hints being decorative. In what world does x: int = "thing" not give someone in the standardisation process pause?
59 comments
Looks like I will be converting to pyright. No disrespect to the astral team, I think they have been pretty careful to note that ty is still in early days. I'm sure I will return to it at some point - uv and ruff are excellent.
But don't get me wrong, I made an entry in my calendar to remind me of checking out ty in half a year. I'm quite optimistic they will get there.
Though they still haven't managed to produce a UI toolkit that is both reliable, fast, and easy to use.
(glad they include ty now)
My teammates who were writing untyped Python previously don't seem to mind it. It's a good addition to the ecosystem!
Compliance suite numbers are biased towards edge cases and not the common path because that's where a lot of the tests need to be added.
My advise is to see how each type checker runs against your own codebase and if the output/performance is something you are happy with.
> My understand is Astral's focus for ty has been on making a good experience for common issues, whereas they plan for very high compliance but difficult or rare edge cases aren't are prioritized.
I would say that's true in terms of prioritization (there's a lot to do!), but not in terms of the final user experience that we are aiming for. We're not planning on punting on anything in the conformance suite, for instance.
For "normal" Python code, I find mypy does pretty good. Certainly I find it helpful, especially on a large code base and when working with other developers of various experience levels.
The reason I prefer pyrefly over mypy is mostly because of speed. Better accuracy is nice but speed it the killer feature. Given the quality of uv and ruff and the experience of the team working on ty, I'm quite confident it's going to be great in that respect as well.
> "well the checker accurately reports it will be type X in an error case not Y"
> "but we never get type X"
> "Then we don't have good enough coverage"
It's so easy in vscode, but it isn't on by default like the c/c++ one I guess because too much legacy code would cause infinite errors. And the age old problem of .pyi files lying about types.
The fact that Mypy fails so badly matches my experience. It would be interesting to see exactly where Pyright "fails". It's been so reliable to me I wouldn't be 100% surprised if these are deliberate deviations from the spec, where it is dumb.