There's no way the red v2 is doing anything with a 120b parameter model. I just finished building a dual a100 ai homelab (80gb vram combined with nvlink). Similar stats otherwise. 120b only fits with very heavy quantization, enough to make the model schizophrenic in my experience. And there's no room for kv, so you'll OOM around 4k of context.
I'm running a 70b model now that's okay, but it's still fairly tight. And I've got 16gb more vram then the red v2.
I'm also confused why this is 12U. My whole rig is 4u.
The green v2 has better GPUs. But for $65k, I'd expect a much better CPU and 256gb of RAM. It's not like a threadripper 7000 is going to break the bank.
I'm glad this exists but it's... honestly pretty perplexing
There's some irony in the fact that this website reads as extremely NOT AI-generated, very human in the way it's designed and the tone of its writing.
Still, this is a great idea, and one I hope takes off. I think there's a good argument that the future of AI is in locally-trained models for everyone, rather than relying on a big company's own model.
One thought: The ability to conveniently get this onto a 240v circuit would be nice. Having to find two different 120v circuits to plug this into will be a pain for many folks.
The exabox is interesting. I wonder who the customer is; after watching the Vera Rubin launch, I cannot imagine deciding I wanted to compete with NVIDIA for hyperscale business right now. Maybe it’s aiming at a value-conscious buyer? Maybe it’s a sensible buy for a (relatively) cash-strapped ML startup; actually I just checked prices, and it looks like Vera Rubin costs half for a similar amount of GPU RAM. I’m certain that the interconnect will not be as good as NV’s.
I have no idea who would buy this. Maybe if you think Vera Rubin is three years out? But NV ships, man, they are shipping.
The problem with all these "AI box" startups is that the product is too expensive for hobbyists, and companies that need to run workloads at scale can always build their own servers and racks and save on the markup (which is substantial). Unless someone can figure out how to get cheaper GPUs & RAM there is really no margin left to squeeze out.
$12,000 for the base model is insane. I have an Apple M3 Max with 128GB RAM that can run 120B parameter models using like 80 watts of electricity at about 15-20 tokens/sec. It's not amazing for 120B parameter models but it's also not 12 grand.
Tinybox is cool but I think the market is maybe looking more for a turn-key explicit promise of some level of intelligence @ a certain Tok/s like "Kimi 2.5 at 50Tok/s".
> In order to keep prices low and quality high, we don't offer any customization to the box or ordering process. If you aren't capable of ordering through the website, I'm sorry but we won't be able to help.
Has this guy never worked on a B2B product before? Nobody is going to order a $10 million piece of infrastructure through your website's order form. And they are definitely going to want to negotiate something, even if it's just a warranty. And you'll do it because they're waving a $10 million check in your face.
The tone of this website is arrogant to the point of being almost hostile. The guy behind this seems to think that his name carries enough weight to dictate terms like this, among other things like requiring candidates to have already contributed to his product to even be considered for a job. I would be extremely surprised if anyone except him thinks he's that important.
Perhaps this company should think about acting as a landlord for their hardware. You buy (or lease) but they also offer colocation hosting. They could partner with crypto miners who are transitioning to AI factories to find the space and power to do this. I wonder if the machines require added cooling, though, in what would otherwise be a crypto mining center. CoreWeave made the transition and also do colocation. The switchover is real.
I think Tinygrad should think about recycling. Are they planning ahead in this regard? Is anyone?
My thought is if there was a central database of who own what and where, at least when the recycling tech become available, people will know where to source their specific trash (and even pay for it.) Having a database like that in the first place could even fuel the industry.
I would love to see real-life tokens/sec values advertised for one or various specific open source models.
I'm currently shopping for offline hardware and it is very hard to estimate the performance I will get before dropping $12K, and would love to have a baseline that I can at least always get e.g. 40 tok/s running GPT-OSS-120B using Ollama on Ubuntu out of the box.
Not sure why they stopped using 6 GPUs in thei builds - with 4 GPUs, both 9070 and rtx6000 come in 2 slot designs, so it easy to build it yourself using a bit more expensive, but still fairly regular motherboard.
With 6 GPUs you have to deal with risers, pcie retimers, dual PSUs and custom case for so value proposition there was much better IMO
Cool that you have a dual power supply model. It says rack mountable or free standing. Does that mean two form factors? $65K is more than we can afford right now but we are definitely eventually in the market for something we can run in our own colo.
It's funny though... we're using deepseek now for features in our service and based on our customer-type we thought that they would be completely against sending their data to a third-party. We thought we'd have to do everything locally. But they seem ok with deepseek which is practically free. And the few customers that still worry about privacy may not justify such a high price point.
Regarding 2x faster than pytorch being a condition for tinygrad to come out of alpha:
Can they/someone else give more details as to what workloads pytorch is more than 2x slower than the hardware provides? Most of the papers use standard components and I assume pytorch is already pretty performant at implementing them at 50+% of extractable performance from typical GPUs.
If they mean more esoteric stuff that requires writing custom kernels to get good performance out of the chips, then that's a different issue.
$12,000 gets you 1Gb/s networking and vanilla Ubuntu 24.04. Napkin math on the hardware it looks like margins are around 50% which feels like a school fundraiser where everyone pays what is obviously way more than normal retail price for X because "it's for the children."
I'm not sure what tinygrad is but I assume the markup is because the customer is making a conscious choice to support the tinygrad project. But what's unusual is there is apparently no reason whatsoever to buy this hardware, even if you plan on using tinygrad exclusively for your project. At least with System76 hardware I get (in theory) first class support for Pop!_OS.
Curious to know who will spend this much money without external funding? Would you spend any VC invested money into this nameless brand? Are there any guardrails or clauses to protect the kind of expenses?
341 comments
I'm running a 70b model now that's okay, but it's still fairly tight. And I've got 16gb more vram then the red v2.
I'm also confused why this is 12U. My whole rig is 4u.
The green v2 has better GPUs. But for $65k, I'd expect a much better CPU and 256gb of RAM. It's not like a threadripper 7000 is going to break the bank.
I'm glad this exists but it's... honestly pretty perplexing
Still, this is a great idea, and one I hope takes off. I think there's a good argument that the future of AI is in locally-trained models for everyone, rather than relying on a big company's own model.
One thought: The ability to conveniently get this onto a 240v circuit would be nice. Having to find two different 120v circuits to plug this into will be a pain for many folks.
I have no idea who would buy this. Maybe if you think Vera Rubin is three years out? But NV ships, man, they are shipping.
> In order to keep prices low and quality high, we don't offer any customization to the box or ordering process. If you aren't capable of ordering through the website, I'm sorry but we won't be able to help.
Has this guy never worked on a B2B product before? Nobody is going to order a $10 million piece of infrastructure through your website's order form. And they are definitely going to want to negotiate something, even if it's just a warranty. And you'll do it because they're waving a $10 million check in your face.
The tone of this website is arrogant to the point of being almost hostile. The guy behind this seems to think that his name carries enough weight to dictate terms like this, among other things like requiring candidates to have already contributed to his product to even be considered for a job. I would be extremely surprised if anyone except him thinks he's that important.
I almost sure it’s possible to custom build a machine as powerful as their red v2 within 9k budget. And have a lot of fun along the way.
Edit: found a third party referencing the claim but it doesn't belong in the title here I think:
Meet the World’s Smallest ‘Supercomputer’ from Tiiny AI; A Machine Bold Enough to Run 120B AI Models Right in the Palm of Your Hand
https://wccftech.com/meet-the-worlds-smallest-supercomputer-...
I think Tinygrad should think about recycling. Are they planning ahead in this regard? Is anyone? My thought is if there was a central database of who own what and where, at least when the recycling tech become available, people will know where to source their specific trash (and even pay for it.) Having a database like that in the first place could even fuel the industry.
$12,000, $65,000, $10,000,000.
I'm currently shopping for offline hardware and it is very hard to estimate the performance I will get before dropping $12K, and would love to have a baseline that I can at least always get e.g. 40 tok/s running GPT-OSS-120B using Ollama on Ubuntu out of the box.
With 6 GPUs you have to deal with risers, pcie retimers, dual PSUs and custom case for so value proposition there was much better IMO
Not revolutionary in any way, but nice. Unless I'm missing something here?
It's funny though... we're using deepseek now for features in our service and based on our customer-type we thought that they would be completely against sending their data to a third-party. We thought we'd have to do everything locally. But they seem ok with deepseek which is practically free. And the few customers that still worry about privacy may not justify such a high price point.
Can they/someone else give more details as to what workloads pytorch is more than 2x slower than the hardware provides? Most of the papers use standard components and I assume pytorch is already pretty performant at implementing them at 50+% of extractable performance from typical GPUs.
If they mean more esoteric stuff that requires writing custom kernels to get good performance out of the chips, then that's a different issue.
"likely" doesn't inspire much confidence. Surely, they have those numbers, and if it was, they'd publicize the comparisons.
I'm not sure what tinygrad is but I assume the markup is because the customer is making a conscious choice to support the tinygrad project. But what's unusual is there is apparently no reason whatsoever to buy this hardware, even if you plan on using tinygrad exclusively for your project. At least with System76 hardware I get (in theory) first class support for Pop!_OS.