Darkbloom – Private inference on idle Macs (darkbloom.dev)

by twapi 252 comments 501 points
Read article View on HN

252 comments

[−] kennywinker 29d ago
I have a hard time believing their numbers. If you can pay off a mac mini in 2-4 months, and make $1-2k profit every month after that, why wouldn’t their business model just be buying mac minis?
[−] eigengajesh 29d ago
The numbers are optimistically legit -- it's calculated based purely considering we have demand for all machines at all times. We don't have that right now, but fairly optimistic that people will do it.

That's why we don't recommend purchasing a new machine. Existing machine is no cost for you to run this.

Electricity is one cost, but it will get paid off from every request it receives. Electricity is only deducted when you run an inference. If you have any questions, DM me @gajesh on Twitter.

[−] mbesto 29d ago

> That's why we don't recommend purchasing a new machine. Existing machine is no cost for you to run this.

You misunderstood. If the ROI is there, there is enough capital in existence for you to accelerate your profit. So why even deal with the complexity of renting people's hardware when you can do it yourself?

[−] splintercell 28d ago
No, what he's saying is that he expects this to be the ROI in the future because his product is so good.
[−] stavros 29d ago
You're not taking into account the thermal strain on the machine, though. A machine that's 100% utilized (even worse if it's in bursts) will last less than an idle machine.
[−] washadjeffmad 29d ago
Not appreciably, and not before a 5-yr AppleCare+ warranty expires.

Out of our >3000 currently active Apple Silicon Macs, failures due to non-physical damage are in the single digits per year. Of those, none have been from production systems with 24/7 uptime and continuous high load, which reflects your parenthetical.

Perhaps we haven't met the other end of the bathtub curve yet, but we also won't be retaining any of these very far beyond their warranty period, much less the end of their support life.

[−] embedding-shape 29d ago

> A machine that's 100% utilized (even worse if it's in bursts) will last less than an idle machine.

How much though? Say I have three Mac Minis next to each other, one that is completely idle but on, one that bursts 100% CPU every 10 minutes and one that uses 100% CPU all the time, what's the difference on how long the machines survives? Months, years or decades?

[−] dmitrygr 28d ago

> Existing machine is no cost for you to run this.

That is not at all how modern chips work. Idle chips are mostly powered down, non-idle ones are working and that causes real measurable wear and tear on the silicon. CPU, RAM, NAND all wear and tear measurably with use on current manufacturing processes.

https://en.wikipedia.org/wiki/Electromigration

[−] BuildTheRobots 28d ago
I don't worry about bandwidth or constant CPU use, but the one thing that will kill my mac is burning out the SSD.

The calculator gives numbers for nearly everything, but I can't obviously see how much space it needs for model storage or how many writes of temp files I should expect if I'm running flat out.

[−] LPisGood 28d ago
My question is why did you have to design this to use an MDM instead of a simple program running in the terminal or something?
[−] avidphantasm 29d ago
If you start buying minis, then you need to house, power, and cool them. So you are building a mini data center. If you are building a small data center, economies of scale will drive you to want to build larger and larger. However, this gets expensive and neighbors tend to not like data centers (for good reason). To me this seems like asymmetric warfare against hyper-scalers.
[−] psychoslave 29d ago
Because they don’t have that much initial money in their pocket, while the idle computer is already there, and the biggest friction point is convincing people to install some software. Both producing rhetoric and software are several order of magnitude cheaper than to directly own and maintain a large fleet of hardware with high guarantee of getting the electrical stable input in a safe place to store them.

Assuming that getting large chunk of initial investment is just a formality is out of touch with 99% of people reality out there, when it’s actually the biggest friction point in any socio-economical endeavour.

[−] dgacmu 29d ago
No provider maintains 100% utilization of GPUs at full rate. Demand is bursty - even if this project is successful, you might expect, e.g., things to be busy during the stock market times when Claude is throwing API errors and then severely underutilized during the same times that Anthropic was offering two-for-one off peak use.

And then there's a hit for overprovisioning in general. If the network is not overprovisioned somewhat, customers won't be able to get requests handled when they want, and they'll flee. But the more overprovisioned it is, the worse it is for compute seller earnings.

I suspect an optimistic view of earnings from a platform like this would be something like 1/8 utilization on a model like Gemma 4. Their calculator estimates my m4 pro mini could earn about $24/month at 3 hours/day on that model. That seems plausible.

[−] chaoz_ 29d ago
Solid q. I think the part of it is that it’s really easy to attract some “mass” (capital) of users, as there are definitely quite a few of idle Macs in the world.

Non-VC play (not required until you can raise on your own terms!) and clear differentiation.

If you want to go full-business-evaluation, I would be more worried about someone else implementing same thing with more commission (imo 95% and first to market is good enough).

[−] dnnddidiej 29d ago
It is too good to be true. When you see it is making more than a claude code subscription for fuck all work per day.

Prolly gonna make $50 a year tops.

[−] znnajdla 29d ago
The numbers are obviously high, because if this takes off then the price for inference will also drop. But I still think it’s a solid economic model that benefits low income countries the most. In Ukraine, for example, I know people who live on $200/month. A couple Mac Minis could feed a family in many places.

As a business owner, I can think of multiple reasons why a decentralized network is better for me as a business than relying on a hyperscaler inference provider. 1. No dependency on a BigTech provider who can cut me off or change prices at any time. I’m willing to pay a premium for that. 2. I get a residential IP proxy network built-in. AI scrapers pay big money for that. 3. No censorship. 4. Lower latency if inference nodes are located close to me.

[−] thih9 29d ago

> These are estimates only. We do not guarantee any specific utilization or earnings. Actual earnings depend on network demand, model popularity, your provider reputation score, and how many other providers are serving the same model.

Others are reporting low demand, eg.: https://news.ycombinator.com/item?id=47789171

[−] liuliu 29d ago
Of course these numbers are ridiculous. Mac Mini (let's assume Apple releases M5 Pro) tops Int8 (let's assume it is the same as FP8, which it is not) at ~50 TFLOPs, with Draw Things, we recently developed hybrid NAX + ANE inference, which can get you ~70 TFLOPs.

A H200 gives you ~4 PFLOPs, which is ~60x at only ~40x price (assuming you can get a Mac Mini at $1000). (Not to mention, BTW, RTX PRO 6000 is ~7x price for ~40x more FLOPs).

Your M4 Mac Mini only has ~20 TFLOPs.

[−] gleenn 29d ago
Power and racking are difficult and expensive?
[−] p1necone 28d ago
Because the "ship software to people, rent their hardware" model has zero up front investment required, presumably. And they don't have to deal with power, cooling, real estate.
[−] agnosticmantis 29d ago
"You could see a single robotaxi being worth, or providing, about $30,000 of gross profit per year. ... A Tesla is an appreciating asset..."

- Elon Musk during Tesla's Autonomy Day in April 2019.

[−] foota 29d ago
Capital and availability?
[−] znpy 29d ago
Being the middleman is often way more profitable
[−] Filligree 29d ago
Because their numbers don’t work out. When you do the math on token cost versus inference speed, you get something that barely breaks even even with cheap power.

Also they’ve already launched a crypto token, which is a terrible sign.

[−] tgma 29d ago
I installed this so you don't have to. It did feel a bit quirky and not super polished. Fails to download the image model. The audio/tts model fails to load.

In 15 minutes of serving Gemma, I got precisely zero actual inference requests, and a bunch of health checks and two attestations.

At the moment they don't have enough sustained demand to justify the earning estimates.

[−] gleenn 29d ago
You have to install their MDM device management software on your computer. Basically that computer is theirs now. So don't plan on just handing over your laptop temporarily unless you don't mind some company completely owning your box. Still might be a validate use for people with slightly old laptops lying around, but beware trying to share this computer with your daily activities if you e.g. use a bank on a browser on this computer regularly. MDM means they can swap out your SSL certs level of computer access, please correct me if I'm wrong.
[−] ramoz 29d ago
Unfortunately, verifiable privacy is not physically possible on MacBooks of today. Don't let a nice presentation fool you.

Apple Silicon has a Secure Enclave, but not a public SGX/TDX/SEV-style enclave for arbitrary code, so these claims are about OS hardening, not verifiable confidential execution.

It would be nice if it were possible. There's a lot of cool innovations possible beyond privacy.

[−] nl 29d ago
They use the TEE to check that the model and code is untampered with. That's a good, valid approach and should work (I've done similar things on AWS with their TEE)

The key question here is how they avoid the outside computer being able to view the memory of the internal process:

> An in-process inference design that embeds the in- ference engine directly in a hardened process, elimi- nating all inter-process communication channels that could be observed, with optional hypervisor mem- ory isolation that extends protection from software- enforced to hardware-enforced via ARM Stage 2 page tables at zero performance cost.[1]

I was under the impression this wasn't possible if you are using the GPU. I could be misled on this though.

[1] https://github.com/Layr-Labs/d-inference/blob/master/papers/...

[−] pants2 29d ago
Cool idea. Just some back-of-the-envelope math here (not trusting what's on their site):

My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B. Darkbloom's pricing is $0.20 per Mtok output.

That's about $2.24/day or $67/mo revenue if it's fully utilized 24/7.

Now assuming 50W sustained load, that's about 36 kWh/mo, at ~$.25/kWh approx. $9/mo in costs.

Could be good for lunch money every once in a while! Around $700/yr.

[−] zv-io 28d ago
You're printing everyone's serial numbers publicly. https://console.darkbloom.dev/providers then "Security Verification" for any machine and then "Verify this device independently" -- all of this can be scraped.
[−] frankfrank13 29d ago
This is one of those ideas I think makes perfect sense, but requires so much operational change for the entire stack, that it would be very difficult to scale:

- Convincing labs to run distributed, burst-y inference

- Convincing people to run their Mac all day, hoping to make a little profit

- Convincing users to trust a distributed network of un-trusted devices

I had a similar idea, pre-AI, just for compute in general. But solving even 1 of those 3 (swap AI lab for managed-compute-type-company, eg Supabase, Vercel) is nearly impossible.

[−] haspok 29d ago
Having strong SETI@Home vibes from 25 years ago, except of course, this is not for the greater good of humanity, but a for-profit project.

Problem is, from a technical point of view, what kind of made sense back then (most people running desktops, fans always on, energy saving minimal) is kind of stupid today (even if your laptop has no fan, would you want it to be always generating heat?)...

I definitely want my laptops to be cool, quiet and idle most of the time.

[−] dgacmu 29d ago
@eigengajesh - Your cost estimator lists Mac Mini M4 Pro with only 24 or 48GB options, but the M4 Pro mini can also be configured with 64GB. At least, I hope so, as I'm typing this on one. ;-)

Oh, also, you seem to have some bugs:

Gemma: WARN [vllm_mlx] RuntimeError: Failed to load the default metallib. This library is using language version 4.0 which is not supported on this OS. library not found library not found library not found

cohere: 2026-04-16T14:25:10.541562Z WARN [stt] File "/Users/dga/.darkbloom/bin/stt_server.py", line 332, in load_model 2026-04-16T14:25:10.541614Z WARN [stt] from mlx_audio.stt.models.cohere_asr import audio as audio_mod 2026-04-16T14:25:10.541643Z WARN [stt] ModuleNotFoundError: No module named 'mlx_audio.stt.models.cohere_asr'

Trying to download the flux image models fails with:

curl: (56) The requested URL returned error: 404

darkbloom earnings does not work

your documentation is inconstent between saying 100% of revenue to providers vs 95%

I think .. this needs a little more care and feeding before you open it up widely. :) And maybe lay off the LLM generated text before it gets you in trouble for promising things you're not delivering.

[−] TuringNYC 29d ago
I'd love a way to do this locally -- pool all the PCs in our own office for in-office pools of compute. Any suggestions from anyone? We currently run ollama but manually manage the pools
[−] dchuk 29d ago
Interesting concept. Two sided marketplaces are hard to bootstrap but maybe just enough curiosity would get the flywheel going. Hell they should just try and convince people to enroll as providers but then also use the service even if it’s hitting their own machines until there’s some degree of supply and demand pressure then try and get only providers to sign up. Or set up some way to encourage providers to promote others to use the service (the 100% rev share kind of breaks that concept but anything can change).

I wish this was self hostable, even for a license fee. Many businesses have fleets of Macs, sometimes even in stock as returned equipment from employees. Would allow for a distributed internal inference network, which has appeal for many orgs who value or require privacy.

[−] stuxnet79 29d ago
So basically ... Pied Piper.
[−] pants2 29d ago
You might not even know it as a user but the payment/distribution here is all built on crypto+stablecoins. This is a great use case for it.
[−] 0xbadcafebee 29d ago
I'm not sure how the economics works out. Pricing for AI inference is based on supply/demand/scarcity. If your hardware is scarce, that means low supply; combine with high demand, it's now valuable. But what happens if you enable every spare Mac on the planet to join the game? Now your supply is high, which means now it's less valuable. So if this becomes really popular, you don't make much money. But if it doesn't become somewhat popular, you don't get any requests, and don't make money. The only way they could ensure a good return would be to first make it popular, then artificially lower the number of hosts.
[−] BingBingBap 29d ago
Generate images requested by randoms on the internet on your hardware.

What could possibly go wrong?

[−] Jn2G3Np8 29d ago
Love the concept, with some similarity to folding@home, though more personal gain.

But trying it out it still needs work, I couldn't download a model successfully (and their list of nodes at https://console.darkbloom.dev/providers suggests this is typical).

And as a cursory user, it took me some digging to find out that to cash out you need a Solana address (providers > earnings).

[−] MyUltiDev 28d ago
The hardware-attested privacy path is the interesting part of this, but the economic side has a quieter risk the thread has not named: the load tax per request. MiniMax M2.5 239B from your catalog still has to load all 239B weights even though only 11B are active — that is roughly 120GB at Q4_K_M, and cold load from SSD on Apple Silicon is measurable in tens of seconds. Even the Qwen3.5 122B MoE lands around 65GB cold. If the coordinator routes request number two to a different idle Mac than request number one, or if the owner's machine spun the model out to free memory in between, each request pays that cold load before the first token. Keeping the model resident 24/7 solves the latency but eats into the power budget the operator is trying to amortize in the first place. How does the coordinator decide which provider to keep warm for which model? A 16GB or 32GB home Mac cannot host Qwen3.5 122B MoE at all, and the Mac Studios that can are a much smaller slice of the 100M machine estimate.
[−] MicBook56 29d ago
I like the idea but it wont take off until Homomorphic Encryption for inference becomes a thing that's efficient and anyone can be a node.
[−] NiloCK 29d ago
Interesting to see an offering with this heritage [1] proposing flat earnings rates for inference operators here, rather than trying to sell a dynamic marketplace where operators compete on price in real-time.

Right now the dashboards show 78 providers online, but someone in-thread here said that they spun one up and got no requests. Surely someone would be willing to beat the posted rate and swallow up the demand?

I expect this is a migration target, but a tactical omission from V1 comms both for legitimate legibility reasons (I can sell x for y is easier to parse than 'I can participate in a marketplace') and slightly illegitimate legibility reasons (obscuring likely future price collapse).

Still - neat project that I hope does well.

[1] Layer Labs, formerly EigenLayer, is company built around a protocol to abstract and recycle economic security guarantees from Ethereum proof of stake.

[−] woadwarrior01 29d ago
I won't install some random untrusted binary off of some website. I downloaded it and did some cursory analysis instead.

Got the latest v0.3.8 version from the list here: https://api.darkbloom.dev/v1/releases/latest

Three binaries and a Python file: darkbloom (Rust)

eigeninference-enclave (Swift)

ffmpeg (from Homebrew, lol)

stt_server.py (a simple FastAPI speech-to-text server using mlx_audio).

The good parts: All three binaries are signed with a valid Apple Developer ID and have Hardened runtime enabled.

Bad parts: Binaries aren't notarized. Enrolls the device for remote MDM using micromdm. Downloads and installs a complete Python runtime from Cloudflare R2 (Supply chain risk). PT_DENY_ATTACH to make debugging harder. Collects device serial numbers.

TL;DR: No, not touching that.

[−] amdivia 29d ago
Until we have breakthroughs in homomorphic encryption compute, I won't trust such privacy claims
[−] DeathArrow 29d ago
Why only Macs? If we think of all PCs and mobile phones running idle, the potential is much larger.
[−] jzig 28d ago
[−] dr_kiszonka 29d ago
"These are estimates only. We do not guarantee any specific utilization or earnings. Actual earnings depend on network demand, model popularity, your provider reputation score, and how many other providers are serving the same model.

When your Mac is idle (no inference requests), it consumes minimal power — you don't lose significant money waiting for requests. The electricity costs shown only apply during active inference.

Text models typically see the highest and most consistent demand. Image generation and transcription requests are bursty — high volume during peaks, quiet otherwise."

[−] Xx_crazy420_xX 26d ago
"Debugger attachment is blocked. Memory inspection is blocked." - reminds me old crackme challenges. Everything they mention can be bypassed, so determined person can start stealing data from the network. For me this is a killer of such distributed compute ideas, but who knows, maybe the non-enteprise users will be desperate enough for cheap compute to make this idea valid.
[−] TheHalfDeafChef 24d ago
Had it for 3 days running with nary a request. Perhaps it's the chosen model that I am serving (Gemma 4 26b)? I did see some WARN logs right after startup but the description don't suggest any problems that would block accepting requests or processing them.
[−] logicallee 29d ago
It's a good project that makes sense. I recommend adding a contractual layer as well, since it's free and makes sense. Operators could legally sign that they will not look into the inference layer. After all, the operators already have a financial relationship with this provider, so it makes sense to add a contract to it and keep operators from looking into other people's data that way, too. I wish this project a lot of success.
[−] heddycrow 29d ago
I think it’s important that systems like this exist, but getting them off the ground is non-trivial.

We’ve been building something similar for image/video models for the past few months, and it’s made me think distribution might be the real bottleneck.

It’s proving difficult to get enough early usage to reach the point where the system becomes more interesting on its own.

Curious how others have approached that bootstrap problem. Thanks in advance.

[−] alexpotato 29d ago
Wasn't there an idea about 15 years ago where you would open your browser, go to a webpage and that page would have a JavaScript based client that would run distributed workloads?

I believe the idea was that people could submit big workloads, the server would slice them up and then have the clients download and run a small slice. You as the computer owner would then get some payout.

Intersting to see this coming back again.

[−] utkarsh_apoorva 29d ago
Like the concept. This is not a business - should be an open source GitHub repo maybe.

They lost me with just one microcopy - “start earning”. Huge red signal.

[−] poorman 28d ago
As one of the only people running a Mac Studio M3 Ultra with 512 GB of RAM on the network, I can tell you at sustained 100% GPU utilization I am measuring 250 watts max (at the power outlet). My solar panels are easily producing this. The power calculation goes away once you connect a solar panel. You can get a 400 watt solar panel on Amazon for $300.
[−] miki123211 29d ago

> Operators cannot observe inference data.

Is there some actual cryptography behind this, or just fundamentally-breakable DRM and vibes?

[−] WatchDog 29d ago
I installed two models, but it just always reports:

    Available models (2):
    CohereLabs/cohere-transcribe-03-2026 (4.6 GB)
    flux_2_klein_9b_q8p.ckpt (20.2 GB)
    ...
    Advertising 0 model(s) (only loaded models)

Also the benchmark just doesn't work.

Interesting idea, but needs some work.

[−] jdironman 28d ago
Reminds me a lot of when I used to deploy this (DataseamGrid) on K12 computers. I was actually just discussing this scenario with a friend.

https://www.dataseam.org/research/

[−] jaffee 29d ago
client side of this kind of needs to be open source unless I'm running it on a dedicated machine and firewalling it from the rest of my network. Or the company needs to have a very strong reputation and certifications. curlbash and go is a pretty hard sell for me
[−] dkroy 28d ago
Cool idea, though hats off to anyone who got cohere-transcribe to show up as serving the model. I could get device to show up, but kept having issues getting their server to properly serve the model though it could just be the device I tested.
[−] auslegung 28d ago
How can one do this safely? If I create a new, non-sudo user, can I install the MDM profile only for that user? I don't understand how this all works obviously so maybe this is a very dumb question
[−] gndp 29d ago
They are almost claiming FHE, isn't it just a matter of creating the right tool to get the generated tokens from RAM before it gets encrypted for transfer. How is it fundamentally different than chutes?
[−] eigengajesh 29d ago
hey guys! i'm the creator. let me know if you have any questions.
[−] jboggan 29d ago
Is this named after the 2011 split album with Grimes and d'Eon?
[−] rvz 29d ago
Should have called it “Inferanet” with this idea.

Away this looks like a great idea and might have a chance at solving the economic issue with running nodes for cheap inference and getting paid for it.

[−] v9v 29d ago
They could consider registering as a provider on something like OpenRouter if they aren't getting enough inference requests on their own site.
[−] podviaznikov 29d ago
I've tried to install it on my mac, but not sure what macOS version it should support.

on 15.1 it failed to serve models.

updated to latest 15.5 and it fails to run binary.

[−] drob518 29d ago
Seems like an interesting way for those people that purchased a Mac Mini to run OpenClaw to pay off the hardware, since mostly it’s now idle.
[−] dcreater 29d ago
I cant buy credits - says page could not load
[−] puttycat 29d ago

> Every request is end-to-end encrypted

Afaik you will need to decrypt the data the moment it needs to be fed into the model.

How do they do this then?

[−] Fokamul 29d ago
Thanks, if this takse off. I have finally some motivation to do exploitation in kernel. :)
[−] ponyous 29d ago
Why does M1 Max project significantly higher revenue than M3 Max with double the ram?