I like the approach of running everything locally. I'm strongly of the opinion that the privacy angle for local models is going to keep getting stronger and more relevant. The amount of articles that come out about accidents happening because of people handing too much context to cloud models the more self reinforcing this will become.
It's only half of the solution though. If the models are trained in a closed way, they can prioritize values encoded during training even if that's not what you want (example: ask the open Chinese models about Tiananmen). It's not beyond imagining that these models would e.g. try to send your data to authorities or advertisers when their training says so, even if you run them locally.
So the full solution would be models trained in an open verifiable way and running locally.
Many Chinese models are being caught doing this (it's also required by law in China) but there was not much hassle.
Having said that Id easily trade some censorship about Chinese affairs I don't care about for the prudishness of American models. Though I generally get the abliterated versions of both.
Another angle is when you're passing untrusted content to the AI service, e.g. anything from using it to crawl websites to spam-detection on new forum user posts.
You can trigger the the service's ToS violation or worse, get tipped off to law enforcement for something you didn't even write.
local is best for privacy, but i personally think you don't need to go local.
anthropic, google, openai etc, decided that their consumer ai plans would not be private. partly to collect training data, the other half to employ moderators to review user activity for safety.
we trust that human moderators will not review and flag our icloud docs, onedrive or gmail, or aggregate such documents into training data for llms. it became the norm that an llm is somehow not private. it became a norm that you can't opt out of training, even on paid plans (see meta and google); or if you can opt out of training, you can't opt out of moderation.
cloud models with a zero retention privacy policy are private enough for almost everyone, the subscriptions, google search, ai search engines are either 'buying' your digital life or covering themselves for legal reasons.
you can and should have private cloud services, and if legal agreement is not enough, cryptographic attestation is already used in compute, with AWS nitro enclaves and other providers.
The other thing, is encrypted inferencing a thing/service currently? I want to run my own models locally just because if I'm going to be chatting to it about my day to day life why send it to a server in plaintext.
> I like the approach of running everything locally. I'm strongly of the opinion that the privacy angle for local models is going to keep getting stronger and more relevant.
I’ve seen several projects like this that offer a network server with access to these Apple models. The danger is when they expose that, even on a loop port, to every other application on your system, including the browser. Random webpages are now shipping with JavaScript that will post to that port. Same-origin restrictions will stop data flow back to the webpage, but that doesn’t stop them from issuing commands to make changes.
Some such projects use CORS to allow read back as well. I haven’t read Apfel’s code yet, but I’m registering the experiment before performing it.
With the Claude bug, or so it is known, burning through tokens at record speed, I gave alternative models a try and they're mostly ... interchangeable. I don't know how easy switching and low brand loyalty and fast markets will play out. I hope that local LLMs will become very viable very soon.
The big question is whether Apple can keep shipping new models constantly.
AFAIK the current model is on par with with Qwen-3-4B, which is from a year ago [0]. There's a big leap going from last year Qwen-3-4B to Qwen-3.5-4B or to Gemma 4.
Apple model is nice since you don't need to download anything else, but I'd rather use the latest model than to use a model from a year ago.
Saw this in an another thread previously and immediately installed it.
I have a new prompt to test LLMs much like simonw's pelican test.
"What is 9:30am Taiwan time in US, Pacific?" For some reason, the answers are quite inconsistent but all wrong.
./apfel "what is 9:30am Taiwan time in US, Pacific?"
Taiwan is 12 hours ahead of the Pacific Time Zone. Therefore, 9:30 AM Taiwan time would be 9:30 PM Pacific Time.
Taiwan is 13 hours ahead of the Pacific Time Zone. Therefore, 9:30 AM in Taiwan is 10:30 PM in the Pacific Time Zone.
Taiwan is in the China Standard Time (CST) zone, which is 12 hours ahead of the Pacific Standard Time (PST) zone. Therefore, 9:30 AM in Taiwan is 9:30 PM in the Pacific.
Taiwan is typically 11 hours ahead of the Pacific Time Zone. Therefore, 9:30 AM in Taiwan is 8:30 PM in the Pacific Time Zone.
Taiwan is 13 hours ahead of the Pacific Time Zone. Therefore, 9:30 AM in Taiwan is 10:30 PM the previous day in the Pacific Time Zone.
Started using this earlier this week. I built a backtesting benchmark tool to compare a mix of frontier and open-source models on a fairly heavy data analysis workflow I’d been running in the cloud.
The task is basically predicting pricing and costs.
Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.
It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.
So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.
I'm a Linux user who wanted exactly this but for Linux — so I ended up building it myself. It's called TalkType, it runs Whisper locally for offline speech-to-text. The privacy angle was a big reason I went local from the start — I didn't want my voice being sent to anyone's server. Nice to see the same idea getting traction on Mac.
Tempted to write a grammarly-like underline engine that flags writing mistakes across all apps and browser. Fully private grammarly alternative without even bundling an LLM!
This doesn't feel truthful, it sounds like this tool is a hack that unlocks something. If I understand it correctly, it's using the same FoundationModels framework that powers Apple Intelligence, but for CLI and OpenAI compatible REST endpoint. Which is fine, just the marketing goes hard a bit.
> Runs on Neural Engine
Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).
Submitted a PR to prevent its installation on macos versions older than Tahoe(26), since I was able to install it on my older macos 15, but it aborted on execution.
157 comments
So the full solution would be models trained in an open verifiable way and running locally.
Having said that Id easily trade some censorship about Chinese affairs I don't care about for the prudishness of American models. Though I generally get the abliterated versions of both.
What I'm more interested in is that if you give it a tool to access Wikipedia, will it censor its answer even then?
You can trigger the the service's ToS violation or worse, get tipped off to law enforcement for something you didn't even write.
anthropic, google, openai etc, decided that their consumer ai plans would not be private. partly to collect training data, the other half to employ moderators to review user activity for safety.
we trust that human moderators will not review and flag our icloud docs, onedrive or gmail, or aggregate such documents into training data for llms. it became the norm that an llm is somehow not private. it became a norm that you can't opt out of training, even on paid plans (see meta and google); or if you can opt out of training, you can't opt out of moderation.
cloud models with a zero retention privacy policy are private enough for almost everyone, the subscriptions, google search, ai search engines are either 'buying' your digital life or covering themselves for legal reasons.
you can and should have private cloud services, and if legal agreement is not enough, cryptographic attestation is already used in compute, with AWS nitro enclaves and other providers.
> I like the approach of running everything locally. I'm strongly of the opinion that the privacy angle for local models is going to keep getting stronger and more relevant.
In HN circles perhaps. Average Joes don’t care.
Some such projects use CORS to allow read back as well. I haven’t read Apfel’s code yet, but I’m registering the experiment before performing it.
With the Claude bug, or so it is known, burning through tokens at record speed, I gave alternative models a try and they're mostly ... interchangeable. I don't know how easy switching and low brand loyalty and fast markets will play out. I hope that local LLMs will become very viable very soon.
AFAIK the current model is on par with with Qwen-3-4B, which is from a year ago [0]. There's a big leap going from last year Qwen-3-4B to Qwen-3.5-4B or to Gemma 4.
Apple model is nice since you don't need to download anything else, but I'd rather use the latest model than to use a model from a year ago.
https://machinelearning.apple.com/research/apple-foundation-...
I have a new prompt to test LLMs much like simonw's pelican test.
"What is 9:30am Taiwan time in US, Pacific?" For some reason, the answers are quite inconsistent but all wrong.
The task is basically predicting pricing and costs.
Apple’s model came out on top—best accuracy in 6 out of 10 cases in the backtest. That surprised me.
It also looks like it might be fast enough to take over the whole job. If I ran this on Sonnet, we’re talking thousands per month. With DeepSeek, it’s more like hundreds.
So far, the other local models I’ve tried on my 64GB M4 Max Studio haven’t been viable - either far too slow or not accurate enough. That said, I haven’t tested a huge range yet.
https://developer.apple.com/documentation/Updates/Foundation...
They released an official python SDK in March 2026:
https://github.com/apple/python-apple-fm-sdk
> Apple locked it behind Siri. apfel sets it free
This doesn't feel truthful, it sounds like this tool is a hack that unlocks something. If I understand it correctly, it's using the same FoundationModels framework that powers Apple Intelligence, but for CLI and OpenAI compatible REST endpoint. Which is fine, just the marketing goes hard a bit.
> Runs on Neural Engine
Also unsure if this runs on ANE, when I tried Apple Intelligence I saw that it ran on the GPU (Metal).
Imagine they baked Qwen 3.5 level stuff into the OS. Wow that’d be cool.
Submitted a PR to prevent its installation on macos versions older than Tahoe(26), since I was able to install it on my older macos 15, but it aborted on execution.
https://github.com/Arthur-Ficial/homebrew-tap/pull/1