Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud (github.com)

by ikessler 21 comments 156 points
Read article View on HN

21 comments

[−] avaer 40d ago
There's also the Prompt API, currently in Origin Trial, which supports this api surface for sites:

https://developer.chrome.com/docs/ai/prompt-api

I just checked the stats:

  Model Name: v3Nano
  Version: 2025.06.30.1229
  Backend Type: GPU (highest quality)
  Folder size: 4,072.13 MiB
Different use case but a similar approach.

I expect that at some point this will become a native web feature, but not anytime soon, since the model download is many multiples the size of the browser itself. Maybe at some point these APIs could use LLMs built into the OS, like we do for graphics drivers.

[−] veunes 39d ago
That’s exactly where we’re headed. Architecturally it makes zero sense to spin up an LLM in every app's userspace. Since we have dedicated NPUs and GPUs now, we need a unified system-level orchestrator to balance inference queues across different programs - exactly how the OS handles access to the NIC or the audio stack. The browser should just be making an IPC call to the system instead of hauling its own heavy inference engine along for the ride
[−] michaelbuckbee 39d ago
FWIW - I did a real world experiment pitting the built in Gemini Nano vs a free equivalent from OpenRouter (server call) and the free+server side was better in literally every performance metric.

That's not to say that the in browser isn't valuable for privacy+offline, just that the standard case currently is pretty rough.

https://sendcheckit.com/blog/ai-powered-subject-line-alterna...

[−] sheept 39d ago
The Summarizer API is already shipped, and any website can use it to quietly trigger a 2 GB download by simply calling

    Summarizer.create()
(requires user activation)
[−] oyebenny 39d ago
Interesting!
[−] veunes 39d ago
It’s a neat idea, but giving a 2B model full JS execution privileges on a live page is a bit sketchy from a security standpoint. Plus, why tie inference to the browser lifecycle at all? If Chrome crashes or the tab gets discarded, your agent's state is just gone. A local background daemon with a "dumb" extension client seems way more predictable and robust fwiw
[−] emregucerr 40d ago
I would love to see someone build it as some kind of an SDK. App builders could use it as a local LLM plugin when dealing with data involving sensitive information.

It's usually too much when an app asks someone to setup a local LLM but this I believe could solve that problem?

[−] montroser 40d ago
Not sure if I actually want this (pretty sure I don't) -- but very cool that such a thing is now possible...
[−] eric_khun 39d ago
it would be awesome if a local model would be directly embeded to chrome and developer could query them.

Anyone know if this is somehow possible without going through an extension?

[−] dabrez 39d ago
I have this written a a project I will attempt to do in the future, I also call it "weapons grade unemployment" in the notes I was proposing to use granite but the principle still stands. You beat me to it.
[−] siddbudd 37d ago
even Anthropic messed up LLM in a Chrome extension, not sure I would want this in my browser.
[−] aimemobe 34d ago
[flagged]
[−] Morpheus_Matrix 40d ago
[flagged]
[−] agdexai 39d ago
[dead]
[−] aimadetools 38d ago
[flagged]
[−] Janelli 32d ago
[flagged]
[−] kabir_daki 39d ago
[flagged]