Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference (gizmoweek.com)

by takumi123 187 comments 303 points
Read article View on HN

187 comments

[−] temp7000 30d ago
Is it me, or does the article sound like LLM output?

The pattern "It's not mere X — it's Y", occurs like 4 times in the text :v

[−] Andrex 30d ago
I can't believe you'd impugn the high moral standards of "gizmoweek dot com".
[−] BeetleB 30d ago
I don't care if it's written by an LLM.

The problem with the article is the complete lack of details. No benchmarks on the iPhone capable models. No details, whatsoever.

Human or LLM - the article is a whole lot of nothing.

[−] doliveira 30d ago
Funnily enough, to me these aphorisms (?) sound almost like the replicant test in Blaze Runner. Like these are the unit bit of "nudging"
[−] nozzlegear 29d ago
LLM, recite your baseline:

"It's not just X – it's Y." Slop. "You're absolutely right!" Slop. "And this is key –" Slop. "This is a nuanced topic." Slop.

https://www.youtube.com/watch?v=vrP-_T-h9YM

[−] nextaccountic 28d ago
The problem is not authorship. It's the lack of substance
[−] dax_ 28d ago
This is just prompting an LLM and just dumping it on the site (which is clearly what is happening here, all the articles show the same signs of AI output, no human writing, no style, as far as I can tell).

If this is the level of care that goes into news articles, then we're doomed. What will ultimately happen is that AI summarizes AI articles, which got summarized from another AI article, which got summarized from another AI article, .. and after enough rewriting all facts will be gone from articles. I don't care to read this slop, and I'm shocked people are so readily accepting this new state of affairs.

[−] veunes 30d ago
This article is all fluff because real benne marketing. If they mentioned that a 4B model on an iPhone 16 drains 15% of the battery for a single long prompt and triggers hard thermal throttling after 20 seconds, nobody would be clicking on headlines about "commercial viability" fwiw
[−] Domenic_S 30d ago
I ran several Gemma 4 quants on my 24gb mac mini, and with proper context size tuning they're quick enough I guess, but I would really love to see them working well on an iphone with 2/3gb of ram...
[−] caminante 30d ago
Ran it through Claude, Grok, whatever...for me, they all flagged issues (no sources, punchy phrases with repetition,...) with these content farms.

My favorite: couldn't even prove the author is a real person. They all found no record!

[−] itissid 30d ago
As someone said we live in a strange but amazing era, where although it has never been easier to be deceived, but its _also_ much easier to uncover said deception especially on the internet.
[−] ryandvm 30d ago
Or at least think you've uncovered deception. It's not clear to me yet that any of these "AI detectors" are reliable, and if they are, it's just an arms race.
[−] walthamstow 30d ago
It's much faster and simpler to assume everything on the internet is crooked
[−] figmert 30d ago

> :v

I guess I found the millennial. I haven't seen that in so long!

[−] Den_VR 30d ago
:<
[−] neals 30d ago
:')
[−] Andrex 30d ago

>_>

[−] xiconfjs 30d ago
\o/
[−] yangm97 30d ago
Analog emojis FTW
[−] Gormo 27d ago
Neither analog nor emojis. An analog emoji would just be a picture printed on paper.
[−] yangm97 27d ago
¬_¬
[−] mannycalavera42 29d ago
(╯°□°)╯︵ ┻━┻
[−] Melatonic 30d ago
¯\_(ツ)_/¯
[−] altruios 30d ago
It is like the AI is training us to avoid certain language patterns. I rebel at the hostage of weak language: for strong language is next.
[−] Melatonic 30d ago
The mighty semi colon prepares for its return !
[−] mtremsal 30d ago
An AI slop pattern so widespread it’s now referred to as “it’s not pee pee it’s poo poo”.
[−] lynndotpy 30d ago
It's not just a widespread pattern –––––––––––––––– it's a sign of things to come.
[−] Domenic_S 30d ago
You didn't just nail it ------------ you cut to the core of the issue.
[−] Cider9986 30d ago
I haven't heard that—that's good.
[−] odo1242 30d ago
It does in fact sound like LLM output
[−] wtyvn 30d ago
Smells like slop to me, looks like the site exists solely to garner search hits.
[−] kbouw 30d ago
You would be correct. Ran the article through GPTZero, 100% AI.
[−] subscribed 30d ago
These detectors are a scam falsely flagging non-native English speakers: https://plagiarismcheckerai.app/ai-detector-false-positives-...

At this point relying on their judgement is beyond folly.

[−] cubefox 30d ago
It's both ironic an confusing that this website itself promotes an AI detector.
[−] subscribed 28d ago
Yeah, I admit I lazily chose one of the first results reporting on this study instead on the best one, so the irony is not lost on me.

Sorry for making you snort and shake your head in amusement :D

[−] xd1936 30d ago
[−] 71bw 30d ago
Would not trust any of these tools in the slightest.
[−] devmor 30d ago
AI detectors that use text as a basis are not real. It is fundamentally impossible for them to exist.
[−] HarHarVeryFunny 30d ago
Huh?

LLM output doesn't have the variety of human output, since they operate in fixed fashion - statistical inference followed by formulaic sampling.

Additionally, the statistics used by LLMs are going be be similar across different LLMs since at scale its just "the statistics of the internet".

Human output has much more variety, partly because we're individuals with our own reading/writing histories (which we're drawing upon when writing), and partly because we're not so formulaic in the way we generate. Individuals have their own writing styles and vocabulary, and one can identify specific authors to a reasonable degree of accuracy based on this.

It's a bit like detecting cheating in a chess tournament. If an unusually high percentage of a player's moves are optimal computer moves, then there is a high likelihood that they were computer generated. Computers and humans don't pick moves in the same way, and humans don't have the computational power to always find "optimal" moves.

Similarly with the "AI detectors" used to detect if kids are using AI to write their homework essays, or to detect if blog posts are AI generated ... if an unusually high percentage of words are predictable by what came before (the way LLMs work), and if those statistics match that of an LLM, then there is an extremely high chance that it was written by an LLM.

Can you ever be 100% sure? Maybe not, but in reality human written text is never going to have such statistical regularity, and such an LLM statistical signature, that an AI detector gives it more than a 10-20% confidence of being AI, so when the detector says it's 80%+ confident something was AI generated, that effectively means 100%. There is of course also content that is part human part AI (human used LLM to fix up their writing), which may score somewhere in the middle.

[−] watsonL1F7 30d ago
[flagged]
[−] veunes 30d ago
I noticed the inference is routed through the gpu rather than the Apple neural engine. Google’s engineers likely gave up on trying to compile custom attention kernels for Apple’s proprietary tensor blocks iirc. While Metal is predictable and easy to port to, it drains the battery way faster than a dedicated NPU. Until they rewrite the backend for the ANE, this is just a flashy tech demo rather than a production-ready tool
[−] blixt 30d ago
I made this offline pocket vibe coder using Gemma 4 (works offline once model is downloaded) on an iPhone. It can technically run the 4B model but it will default to 2B because of memory constraints.

https://github.com/blixt/pucky

It writes a single TypeScript file (I tried multiple files but embedded Gemma 4 is just not smart enough) and compiles the code with oxc.

You need to build it yourself in Xcode because this probably wouldn't survive the App Store review process. Once you run it, there are two starting points included (React Native and Three.js), the UX is a bit obscure but edge-swipe left/right to switch between views.

[−] codybontecou 30d ago
Unfortunately Apple appears to be blocking the use of these llms within apps on their app store. I've been trying to ship an app that contains local llms and have hit a brick wall with issue 2.5.2
[−] karimf 30d ago
Related: Gemma 4 on iPhone (254 comments) - https://news.ycombinator.com/item?id=47652561
[−] logicallee 30d ago
For those who would like an example of its output, I'm currently working through creating a small, free (cc0, public domain) encyclopedia (just a couple of thousand entries) of core concepts in Biology and Health Sciences, Physical Sciences, and Technology. Each entry is being entirely written by Gemma 4:e4b (the 10 GB model.) I believe that this may be slightly larger than the size of the model that runs locally on phones, so perhaps this model is slightly better, but the output is similar. Here is an example entry:

https://pastebin.com/ZfSKmfWp

Seems pretty good to me!

[−] mfro 30d ago
Strangely, it is super fast on my 16 Plus, but with longer messages it can slow down a LOT, and not because of thermal throttling. I wish I could see some diagnostic data.
[−] conception 30d ago
I’m pretty excited about the edge gallery ios app with gemma 4 on it but it seems like they hobbled it, not giving access to intents and you have to write custom plugins for web search, etc. Does anyone have a favorite way to run these usefully? ChatMCP works pretty well but only supports models via api.
[−] Chrisszz 30d ago
I just installed Google Ai Edge Gallery on my iPhone 16 pro, here are the results of the first benchmark with GPU, Prefill Tokens=256, Decode Tokens=256, Number of runs: 3. Prefill Speed=231t/s, Decode Speed=16t/s, Time to First Token=1.16s, First init time=20s
[−] abc_lisper 30d ago
Careful with using these small models. The other day, I asked it "Can dogs eat avocado" and answer was emphatic Yes.

This is not meant as a criticism, but people should be aware of their limitations.

[−] rich_sasha 30d ago
Offline or not, I'm sure Google uploads every keystroke, phone orientation, photo, WiFi endpoints and your shoe size when you interact with it. To enhance your experience.
[−] mistic92 30d ago
It runs on Android too, with AI Core or even with llama.cpp
[−] usmanshaikh06 30d ago
ESET is blocking this site saying:

Threat found This web page may contain dangerous content that can provide remote access to an infected device, leak sensitive data from the device or harm the targeted device. Threat: JS/Agent.RDW trojan

[−] jimbokun 30d ago
I feel like UX and API design are very under explored.

What are the possibilities of an Android or iOS device where the OS is centered around a locally running LLM with an API for accessing it from apps, along with tools the LLM can call to access data from locally running apps? What’s the equivalent of the original Mac OS?

Do apps disappear and there’s just a running dialog with the LLM generating graphical displays as needed on demand?

[−] pabs3 30d ago

> edge AI deployment

Isn't the "edge" meant to be computing near the user, but not on their devices?

[−] juancn 30d ago
Gemma4 is still power hungry since it tends to activate pretty much every weight.

qwen3-coder-next uses a lot less since it seems to only activate ~3B parameters at a time.

My guess is that this is still close to tech demo, and a lot of performance is left on the table.

[−] deckar01 30d ago
They still don’t render the markdown (or LaTeX) it outputs.
[−] bearjaws 30d ago
Would love to see a show down of performance on iPhone vs Googles Tensor G5, which in my experience the G5 is 2 full generations behind performance wise.
[−] declan_roberts 30d ago
I really hope this is a preview of the replacement for Siri that Google is creating bc these models are fantastic for their size!
[−] DoctorOetker 30d ago
does anyone know of a decent but low memory or low parameter count multilingual model (as many languages as possible), that can faithfully produce the detailed IPA transcription given a word in a sentence in some language?

I want to test a hypothesis for "uploading" neural network knowledge to a user's brain, by a reaction-speed game.

[−] bossyTeacher 30d ago
Is the output coherent though? I am yet to see a local model working on consumer grade hardware being actually useful.
[−] politelemon 30d ago
This is HN clickbait. No details or evidence, this is just generated for votes.

I think this should be flagged.

[−] the_inspector 30d ago
You are referring to the edge models, right? E2B and E4B, not the bigger ones (26B, 31B)...
[−] andsoitis 30d ago
is there a comparison of it running on iPhone vs. Android phones?
[−] ValleZ 30d ago
There are many apps to run local LLMs on both iOS & Android
[−] grimmai143 30d ago
Do you know of a way of running these models on Android? Also, what does the thermal throttling look like?
[−] grimm7000 30d ago
[dead]
[−] camillomiller 30d ago
[flagged]
[−] abstracthinking 30d ago
I don't see the value in this post, are hacker news post being upvoted by bots?