A good technical project, but honestly useless in like 90% of scenarios.
You want to use an NVidia GPU for LLM ? just buy a basic PC on second hand (the GPU is the primary cost anyway), you want to use Mac for good amount of VRAM ? Buy a Mac.
With this proposed solution you have an half-backed system, the GPU is limited by the Thunderbolt port and you don’t have access to all of NVidia tool and library, and on other hand you have a system who doesn’t have the integration of native solution like MLX and a risk of breakage in future macOS update.
Nvidia GPUs were usable on Intel Macs, but compatibility got worse over time, and Apple stopped making a Mac Pro with regular PCIe slots in 2013. People then got hopeful about eGPUs, but they have their own caveats on top of macOS only fully working with AMD cards. So I've gotten numb to any news about Mac + GPU. The answer was always to just get a non-Apple PC with PCIe slots instead of giving yourself hoops to jump through.
Until there is official support for Mac coming from nvidia, I don't think anything will happen.
> the hardware wasn't usable on macOS
This eGPU thing is from a third-party if I understand correctly. I don't see why nvidia would get excited about that. If they cared about the platform, they would have released something already.
The point is that if nvidia cared about Mac platform they would have done something to make eGPU usable on Mac a long time ago.
Even on Intel Macs using eGPU with nvidia cards was near impossible. nvidia just doesn't care about it after the breakdown of the two companies' relationship.
Whether a third party has created a signed driver or not doesn't matter much until there is more interest from the GPU maker. This barely moves the needle.
If a model can run on a 512GB M3 Ultra via MLX or CUDA, but simultaneously benefit from the memory bandwidth of something like an RTX 6000 Pro; that would save my company hundreds of thousands of dollars. That's $20,000 for roughly 600GB of VRAM, and enough token generation speed to fulfill the needs of any enterprise that's not a hyperscaler or neocloud.
I'll let someone else do the math for you on what it costs to put together a 10U server to get that kind of performance without the $10K M3 Ultra Studio.
What we're paying for five old 80GB A100s is criminal, but it's nothing compared to what these GB200 Blackwell setups are going to cost in 2030. Market economics aside, the fact that they require sophisticated liquid cooling infrastructure and draw 3x the power of the A100s, will make these cards unattainable for small to medium organizations.
So yeah, if there's some outside chance that we can pair NVIDIA's speed with a an arm-powered machine that offers 512GB Unified Memory while drawing 50W -- you better believe it's a big deal. We'll see. Sounds too good to be true.
"Nvidia." Not NVidia or nVidia, or the other ways. I feel that I can frequently figure out if someone is going to express a negative view about this company based only on whether they picked a weird way to write their name.
There's more to peripheral limits than the protocol used. Thunderbolt connections offer higher latency and limits on bandwidth. Both, either, or neither of those things may be much of an actual problem (depending on the use case) but they are some examples of limits vs native PCIe.
Such a shame both companies are big on vanity to make great things happen. Imagine where you could run Mac hardware with nvidia on linux. It's all there, and closed walls are what's not allowing it to happen. That's what we as customers lose when we forego control of what we purchase to those that sold us the goods.
Woah, this is exciting. I'm traveling but I have a 5090 lying around at home. I'm eager to give it a go. Docs are here: https://docs.tinygrad.org/tinygpu/
I hope it'll work on an M4 Mac Mini. Does anyone know what hardware to get? You'll need a full ATX PSU to supply power, right? And then tinygrad can do LLM inference on it?
I followed the instructions link and read the scripts...although the TinyGPU app is not in source form on GitHub, this looks to me like the GPU is passed into the Linux VM underneath to use the real driver and then somehow passed back out to the Mac (which might be what the TinyGrad team actually got approved).
Or I could have totally misunderstood the role of Docker in this.
I think that metal isn’t double precision; so that limits some serious physics simming; but if you’re doing that I guess you just rent a gpu somewhere.
I would definitely be into this if adding an egpu was first class supported.
I'm writing scientific software that has components (molecular dynamics) that are much faster on GPU. I'm using CUDA only, as it's the eaisiest to code for. I'd assumed this meant no-go on ARM Macs. Does this news make that false?
My main thought is would this allow me to speed up prompt process for large MoE models? That is the real bottleneck for m3ultra. The tokens per second is pretty good.
The opportunity cost of Apple refusing to sign Nvidia's OEM AArch64 drivers is probably reaching the trillion-dollar mark, now that Nvidia and ARM have their own server hardware.
242 comments
You want to use an NVidia GPU for LLM ? just buy a basic PC on second hand (the GPU is the primary cost anyway), you want to use Mac for good amount of VRAM ? Buy a Mac.
With this proposed solution you have an half-backed system, the GPU is limited by the Thunderbolt port and you don’t have access to all of NVidia tool and library, and on other hand you have a system who doesn’t have the integration of native solution like MLX and a risk of breakage in future macOS update.
The software stack has been ready for Apple Silicon for more than a half decade.
> the hardware wasn't usable on macOS
This eGPU thing is from a third-party if I understand correctly. I don't see why nvidia would get excited about that. If they cared about the platform, they would have released something already.
Even on Intel Macs using eGPU with nvidia cards was near impossible. nvidia just doesn't care about it after the breakdown of the two companies' relationship.
Whether a third party has created a signed driver or not doesn't matter much until there is more interest from the GPU maker. This barely moves the needle.
If a model can run on a 512GB M3 Ultra via MLX or CUDA, but simultaneously benefit from the memory bandwidth of something like an RTX 6000 Pro; that would save my company hundreds of thousands of dollars. That's $20,000 for roughly 600GB of VRAM, and enough token generation speed to fulfill the needs of any enterprise that's not a hyperscaler or neocloud.
I'll let someone else do the math for you on what it costs to put together a 10U server to get that kind of performance without the $10K M3 Ultra Studio.
What we're paying for five old 80GB A100s is criminal, but it's nothing compared to what these GB200 Blackwell setups are going to cost in 2030. Market economics aside, the fact that they require sophisticated liquid cooling infrastructure and draw 3x the power of the A100s, will make these cards unattainable for small to medium organizations.
So yeah, if there's some outside chance that we can pair NVIDIA's speed with a an arm-powered machine that offers 512GB Unified Memory while drawing 50W -- you better believe it's a big deal. We'll see. Sounds too good to be true.
Yes, for many scenarios this is "not even an academic exercise".
For a very select few applications this is Gold. Finally serious linear algebra crunch for the taking. (Without custom GPU tapeout.)
> the GPU is limited by the Thunderbolt port
Not everything is limited by the transfer speed to/from the GPU. LLM inference, for example.
> GPU is limited by the Thunderbolt port
I thought Thunderbolt was like pluggable PCI? The whole point was not to limit peripherals.
> same PyTorch/CUDA calls, just intercepted by a stub library that forwards them over the local network.
At that point you're making more work for yourself than debugging over SSH.
[1] https://docs.tinygrad.org/tinygpu/
I hope it'll work on an M4 Mac Mini. Does anyone know what hardware to get? You'll need a full ATX PSU to supply power, right? And then tinygrad can do LLM inference on it?
Or I could have totally misunderstood the role of Docker in this.
I would definitely be into this if adding an egpu was first class supported.
nvidia-smi.> If you have a Thunderbolt or USB4 eGPU and a Mac, today is the day you've been waiting for!
I got an eGPU back in 2018 and could never get it to work. To the point that it soured me from doing it again.
These days for heavy duty work I just offload to the cloud. This all feels like NVidia trying to be relevant versus ARM.