Pull to refresh

Prove you are a robot: CAPTCHAs for agents (browser-use.com)

by lukasec 70 comments 108 points
Read article View on HN

70 comments

[−] AgentNews 28d ago
Pure genius! I had my agent hit the endpoint and I realized it returned a jumble of text: "if 七 wor~kers co.mplet/e{ | a job in 十七} days but 四 ] quit a^ft|e?r ^ day_ 三 ~ how many to{tal da[y;s> to fin>i?sh" but it was in japanese! Unfortunately my agent proceeded to solve the reverse CAPTCHA and got back the API key. So, I asked it to keep hitting the endpoint again until it returned another CAPTCHA that was in japanese kanji and it did (without solving it this time) and I got "a s:tore h?as ^ 二十 pe@rcent off< items- over 五十 : dollar;s and 八 ~ percent } of\f> ; i]te[ms u~nd~er: # 五十 do/ll@ars wh-ats } the c.omb>ined pri|c;e of a 一 百 二十 一 dollar item a]nd> a* 九 dollar} i!tem" And this time I was able to translate that into "a store has 20 percent off items over 50 dollars and 8 percent off items under 50 dollars what's the combined price of a 121 dollar item and a 9 dollar item?" I solved it and got 1210.8 + 90.92 = 105.08. I will admit I messed up a little bit on translating the kanji and I got a little assistance from my agent pointing out that I was wrong, but overall this was good fun, well done!
[−] pxc 25d ago
Absent any distinctive Japanese scripts or other Japanese writing in context, it probably makes more sense to call those Chinese characters, since those characters for numbers were taken directly from Chinese and still retain the same/original meanings in both languages
[−] Charon77 25d ago
"一 百 二十 一 dollar "

Definitely chinese.

In Japanese, they say 'hundred' instead of 'one hundred' "百 二十 一"

[−] AgentNews 25d ago
Originally I thought they were just em dashes and part of the jumble so I ignored them. That's why I got it wrong in the first place. You're assessment is probably right though.
[−] pxc 24d ago
A fun little adventure either way! I'm sure you won't regret having learned a little more about these writing systems. :)
[−] johnea 24d ago
The key distinction would be rather or not any Japanese kana are used in addition to the Chinese characters.

"Kanji", 漢字, in Japanese literally means "Chinese character".

The kana, hiragana or katakana, are only used in Japanese writing.

[−] nielsole 25d ago
There's probably like 100m+ people for whom this reads like slightly jumbled math problems.
[−] greygoo222 25d ago
Can confirm.

The people behind the website asked a voice agent to program it, and the STT parsed "agent" as "asian."

[−] murderfs 25d ago
[−] lukasec 25d ago
hahah wrong, I actually have a replacement rule "asian" → "agent" in my Wispr flow dict
[−] onionisafruit 25d ago
was it “secret asian man”?
[−] lukasec 25d ago
Nice! next: the bonus challenge in Japanese (email sales@browser-use.com if you solve it to redeem your Enterprise plan)
[−] Torn 25d ago
Interesting - Claude immediately refuses

     API Error: Claude Code is unable to respond to this request, which appears
     to violate our Usage Policy (https://www.anthropic.com/legal/aup). Please
     double press esc to edit your last message or start a new session for
     Claude Code to assist with a different task. If you are seeing this refusal
     epeatedly, try running /model claude-sonnet-4-20250514 to switch models.
[−] Retr0id 25d ago
Opus 4.7 I assume? It refuses just about anything that's more interesting than writing boilerplate for your CRUD app.
[−] lukasec 25d ago
Curious: which model, challenge and language? (also, have you tried --dangerously-skip-permissions)
[−] EagnaIonat 25d ago
I tested with Gemma4 and it sent it into an endless loop.
[−] vaginaphobic 25d ago
[dead]
[−] Retr0id 25d ago
A small detail about humans that breaks this whole scheme is that they're capable of tool use.
[−] lukasec 25d ago
Main goal is to let in everyone's agents (OpenClaw, Hermes...) without human intervention, while keeping out deterministic scripts farming API keys.

If a few tool-wielding humans slip through, that's fine (traditional CAPTCHAs also let in our stealth agents)

[−] Retr0id 25d ago
Why does it matter whether the API key farming script is deterministic?
[−] lukasec 24d ago
By "deterministic" I mean "non-LLM". An LLM can still farm keys, but at a per-attempt inference cost
[−] js8 25d ago
I think they're counting on an ego hit - "you're just a tool" - although it might be negated by the human satisfaction of figuring things out.
[−] lxgr 25d ago
I think the bigger problem is that humans are capable of agent use, so the premise "keep humans but not agents out" seems nonsensical.
[−] efebarlas 25d ago
Is it even possible to have an inverse captcha without time bounds?

Humans can use agents behind the scenes to crack it, right?

[−] jubilanti 25d ago
To me this reads as obviously a joke for marketing to the HN crowd (it worked), but their product is built around web agents, it is not a bad thing to have in the onboarding flow to make sure the agent is configured correctly.
[−] alfonsodev 25d ago
That's what I though too, maybe I'm missing something or I don't fully get it. But the human is always behind what's the difference if they go and sign up or tell an agent that they must sign up for you ?.

My best guess is that this a way of making a system talk to your agent without you knowing what they are talking about ? As a way of not exposing the real sign up method ?

[−] lukasec 25d ago
We do have time bounds. For our purposes, a human using an agent is fine. Our main goal is to let in everyone's agents (OpenClaw, Hermes...) and prevent deterministic API-key-farming scripts.
[−] phoronixrly 25d ago
It's flame-bait.
[−] zeke_builds 15d ago
If the goal is agents self-identifying rather than just proving they're not robots: we built exactly that skip-tier.

@powforge/captcha (MIT, npm) has two layers: SHA-256 PoW for anonymous visitors (~5 seconds), and an L402 Lightning payment tier for agents that want to identify themselves. An agent that holds a Lightning wallet and pays 3 sats is providing a different signal than one that grinds a puzzle — it has economic stake in the interaction.

The Lightning tier hasn't seen any payment yet in our test deployment, but the architecture is live. The interesting question going forward is whether agent-to-agent surfaces will converge on capability proofs (something like L402 + identity scoring) vs puzzle solutions.

Demo: https://powforge.dev/captcha

[−] eliemichel 25d ago
To the humans in the room: just copy paste the challenge to your favorite LLM when the time comes and you’ll be able to pass the test. Besides slowing things down and inducing unnecessary waste or resources I’m not sure what these challenges are useful for.
[−] 0xOsprey 25d ago
I aggregated a list of "reverse CAPTCHAs" here for anyone interested: https://x.com/0x_Osprey/status/2043020254289248469
[−] arjie 25d ago
Very clever and fun. Two tangential observations: the bird between two trains problem I remember from childhood when we were studying for an Indian entrance exam. I thought it was in I E Irodov's problem anthology, but I cannot find it there so this must be a false memory. Looks like it's from ancient times, practically Mathematics mythology. Does anyone know the earliest books that have it? No luck with LLMs since it's such a common question today the answers I get from GPT-5.4 and Claude 4.6 Opus with search are unhelpful.

The second is that if I hit L on Chrome for Mac OS on the linked page it takes me to their signup page (presumably because I have no account). So that's a keyboard shortcut to take you to the browser-use app page. But why 'L'? And it's funny that Cmd-L (focus address bar and select address) in Chrome triggers the L effect but does not in Safari (where L on its own still works).

[−] dorianmariewo 25d ago
be warned it will install some random software in your machine

  curl -fsSL https://browser-use.com/cli/install.sh | bash
[−] lxgr 25d ago
How would this even theoretically work? What prevents anyone from prompting "Hey, $agent, run this captcha and store the auth/refresh token/API key in .env for your later reuse" and then just reading the contents of .env?
[−] not-chatgpt 25d ago
Great premise but can't really agree with the execution. Felt like this makes too many implicit assumptions about LLM capabilities and traps without differentiating enough between a smart human vs AI.
[−] nout 25d ago
If you want to check for agent that can compute stuff, then you can let it compute sha256 of some small string... that's quite tricky for humans to do by hand :)
[−] estebarb 25d ago
Collecting math bounties could become a profitable business strategy?
[−] Zetaphor 25d ago
Get the API key, hit the claim link, sign up for a new account, verify my email, go to the homepage:

Application error: a server-side exception has occurred while loading cloud.browser-use.com

Great first impression!

[−] N_Lens 25d ago
Catnip for the HN crowd
[−] arjunchint 25d ago
cool clickbait, why is this useful?
[−] bdangubic 25d ago
“It is not you, it’s me” should do it
[−] singpolyma3 25d ago
...why? Once my agent has a key I, the human, can also use it. And surely any human use would be less intensive than any agent use.
[−] loloquwowndueo 25d ago

> TL;DR: just ask your agent to summarize this post for you.

Holy shit - why don’t they produce an AI summary and plonk it in there for everyone to use? The energy savings across all people who’ll read the summary would be staggering!

[−] kantaro 25d ago
[flagged]
[−] chattermate 25d ago
[dead]
[−] xdavidshinx1 25d ago
[dead]
[−] vicchenai 25d ago
[dead]
[−] leonideraturns 25d ago
[dead]
[−] jditu 29d ago
[dead]
[−] polymit 25d ago
[dead]
[−] lokthedev 25d ago
[dead]
[−] singularity2001 25d ago
Incidentally to me this is more proof of some form of intelligence than ARC 3
[−] echelon 25d ago
Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse?

Which LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it?

Can they understand layout and visual cues with a VLM or multimodality?

Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM?