I use Playwright to intercept all requests and responses and have Claude Code navigate to a website like YouTube and click and interact with all the elements and inputs while recording all the requests and responses associated with each interaction. Then it creates a detailed strongly typed API to interact with any website using the underlying API.
Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
I also do this. My primary use case is for reproducing page layout and styling at any given tree in the dom. So, capturing various states of a component etc.
I also use it to automatically retrieve page responsiveness behavior in complex web apps. It uses playwright to adjust the width and monitor entire trees for exact changes which it writes structured data that includes the complete cascade of styles relevant with screenshots to support the snapshots.
There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans. So, lots of clickety-clackety and human speed results.
---
My first reaction to seeing this FP was why are people still releasing MCPs? So far I've managed to completely avoid that hype loop and went straight to building custom CLIs even before skills were a thing.
I think people are still not realizing the power and efficiency of direct access to things you want and skills to guide the AI in using the access effectively.
Maybe I'm missing something in this particular use case?
> There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans.
You should try my SnipCSS Claude Code plugin. It still uses MCP as skill (haven't converted to CLI yet), but it does exactly what you want for reproducing designs in Tailwind/CSS at AI speeds.
> My first reaction to seeing this FP was why are people still releasing MCPs?
MCPs are more difficult to use. You need to use an agent to use the tools, can't do it manually easily. I wonder if some people see that friction as a feature.
Yes please, maybe there will be some solution that will fit the problem better! I recently released something similar, and because of the small API, I'm more comfortable using it.
>I'm in so deep that Claude Code can predict the stock market.
“What?”, more polite than “yeah right” :)
(oh I guess obviously it would have a chance at nailing it for weeks in a row, and have more good years than bad—since actively managed funds can pull that off until, universally, they can’t [beat the market])
I'm curious, have you developed your own reasoning system for how Claude can predict the stock market? Or have you trained it on past data combined with news sources?
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
Google is so far behind agentic cli coding. Gemini CLI is awful. So bad in fact that it’s clear none of their team use it. Also MCP is very obviously dead, as any of us doing heavy agentic coding know. Why permanently sacrifice that chunk of your context window when you can just use CLI tools which are also faster and more flexible and many are already trained in. Playwright with headless Chromium or headed chrome is what anyone serious is using and we get all the dev and inspection tools already. And it works perfectly. This only has appeal to those starting out and confused into thinking this is the way. The answer is almost never MCP.
Been using this one for a while, mostly with codex on opencode. It's more reliable and token efficient than other devtools protocol MCPs i've tried.
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
We tested this — the default take_snapshot path (Accessibility.getFullAXTree) is safe. It filters display:none elements because they're excluded from the accessibility tree.
But evaluate_script is the escape hatch. If an agent runs document.body.textContent instead of using the AX tree, hidden injections in display:none divs show up in the output. innerText is safe (respects CSS visibility), textContent is not (returns all text nodes regardless of styling).
The gap: the agent decides which extraction method to use, not the user. When the AX tree doesn't return enough text, a plausible next step is evaluate_script with textContent — which is even shown as an example in the docs.
Also worth noting: opacity:0 and font-size:0 bypass even the safe defaults. The AX tree includes those because the elements are technically 'rendered' and accessible to screen readers. display:none is just the most common hiding technique, not the only one.
I've been using the DevTools MCP for months now, but it's extremely token heavy. Is there an alternative that provides the same amount of detail when it comes to reading back network requests?
I've been using TideWave[1] for the last few months and it has this built-in. It started off as an Elixir/LiveView thing but now they support popular JavaScript frameworks and RoR as well. For those who like this, check it out. It even takes it further and has access to the runtime of your app (not just the browser).
The agent basically is living inside your running app with access to databases, endpoints etc. It's awesome.
Very cool. I do something like this but with Playwright. It used to be a real token hog though, and got expensive fast. So much so that I built a wrapper to dump results to disk first then let the agent query instead. https://uisnap.dev/
Will check this out to see if they’ve solved the token burn problem.
I built something in this space, bb-browser (https://github.com/epiral/bb-browser). Same CDP connection, but the approach is honestly kind of cheating.
Instead of giving agents browser primitives like snapshot, click, fill, I wrapped websites into CLI commands. It connects via CDP to a managed Chrome where you're already logged in, then runs small JS functions that call the site's own internal APIs. No headless browser, no stolen cookies, no API keys. Your browser is already the best place for fetch to happen. It has all the cookies, sessions, auth state. Traditional crawlers spend so much effort on login flows, CSRF tokens, CAPTCHAs, anti-bot detection... all of that just disappears when you fetch from inside the browser itself. Frontend engineers would probably hate me for this because it's really hard to defend against.
So instead of snapshot the DOM (easily 50K+ tokens), find element, click, snapshot again, parse... you just run
bb-browser site twitter/feed
and get structured JSON back.
Here's the thing I keep thinking about though. Operating websites through raw CDP is a genuinely hard problem. A model needs to understand page structure, find the right elements, handle dynamic loading, deal with SPAs. That takes a SOTA model. But calling a CLI command? Any model can do that. So the SOTA model only needs to run once, to write the adapter. After that, even a small open-source model runs "bb-browser site reddit/hot" just fine.
And not everyone even needs to write adapters themselves. I created a community repo, bb-sites (https://github.com/epiral/bb-sites), where people freely contribute adapters for different websites. So in a sense, someone with just an open-source model can already feel the real impact of agents in their daily workflow. Agents shouldn't be a privilege only for people who can access SOTA models and afford the token costs.
There's a guide command baked in so if you do want to add a new site, you can tell your agent "turn this website into a CLI" and it reverse-engineers the site's APIs and writes the adapter.
v0.8.x dropped the Chrome extension entirely. Pure CDP, managed Chrome instance. "npm install -g bb-browser" and it works.
I’ve been experimenting with a similar approach using Playwright, and the biggest takeaway for me was how much “hidden API” most modern websites actually have.
Once you start mapping interactions → network calls, a lot of UI complexity just disappears. It almost feels like the browser becomes a reverse-engineering tool for undocumented APIs.
That said, I do think there’s a tradeoff people don’t talk about enough:
- Sites change frequently, so these inferred APIs can be brittle
- Auth/session handling gets messy fast
- And of course, the ToS / ethical side is a gray area
Still, for personal automation or internal tooling, it’s insanely powerful. Way more efficient than driving full browser sessions for everything.
Curious how others are handling stability — are you just regenerating these mappings periodically, or building some abstraction layer on top?
I asked Claude to use this with the new scheduled tasks /loop skill to update my Oscar picks site every five minutes during tonight’s awards show. It simply visited the Oscars' realtime feed via Chrome DevTools, and updated my picks and pushed to gh pages. It even handled the tie correctly.
234 comments
Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
I also use it to automatically retrieve page responsiveness behavior in complex web apps. It uses playwright to adjust the width and monitor entire trees for exact changes which it writes structured data that includes the complete cascade of styles relevant with screenshots to support the snapshots.
There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans. So, lots of clickety-clackety and human speed results.
---
My first reaction to seeing this FP was why are people still releasing MCPs? So far I've managed to completely avoid that hype loop and went straight to building custom CLIs even before skills were a thing.
I think people are still not realizing the power and efficiency of direct access to things you want and skills to guide the AI in using the access effectively.
Maybe I'm missing something in this particular use case?
> There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans.
You should try my SnipCSS Claude Code plugin. It still uses MCP as skill (haven't converted to CLI yet), but it does exactly what you want for reproducing designs in Tailwind/CSS at AI speeds.
https://snipcss.com/claude_plugin
Without it youre stuck with the basic http firewall, etc which is extremely dangerous and this is maybe the 1 opportunity we have to do this.
> My first reaction to seeing this FP was why are people still releasing MCPs?
MCPs are more difficult to use. You need to use an agent to use the tools, can't do it manually easily. I wonder if some people see that friction as a feature.
My excuse for not keeping up is that I'm in so deep that Claude Code can predict the stock market.
I'll still publish mine and see if has any value but agent browser looks very complete.
Thank you for sharing!
https://news.ycombinator.com/item?id=47207790
>I'm in so deep that Claude Code can predict the stock market.
“What?”, more polite than “yeah right” :)
(oh I guess obviously it would have a chance at nailing it for weeks in a row, and have more good years than bad—since actively managed funds can pull that off until, universally, they can’t [beat the market])
It has an in-built MCP server and I use it with claude code, codex and like it quite a lot.
[1] https://github.com/sidwyn/webmcp-tool-library
[2] https://github.com/sidwyn/webmcp-tool-library/blob/main/cont...
Did you compare playwright with mcp? Why one over another?
I use MCP usually, because I heard it’s less detectable than playwright, and more robust against design changes, but I didn’t compare/test myself
- the agent takes actions on a page
- network calls recorded
- agent writes script to hit endpoints directly at scale
- requests made from main world of webpage so automatically get auth headers added
Basically what you do manually but done via the agent in a minute and for FREE from an AI Studio API key
Never tried that level of autonomy though. How long is your iteration cycle?
If I had to guess, mine was maybe 10-20 minutes over a few prompts.
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
https://github.com/pasky/chrome-cdp-skill
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
https://github.com/microsoft/playwright-cli
But evaluate_script is the escape hatch. If an agent runs document.body.textContent instead of using the AX tree, hidden injections in display:none divs show up in the output. innerText is safe (respects CSS visibility), textContent is not (returns all text nodes regardless of styling).
The gap: the agent decides which extraction method to use, not the user. When the AX tree doesn't return enough text, a plausible next step is evaluate_script with textContent — which is even shown as an example in the docs.
Also worth noting: opacity:0 and font-size:0 bypass even the safe defaults. The AX tree includes those because the elements are technically 'rendered' and accessible to screen readers. display:none is just the most common hiding technique, not the only one.
We could put them in a dedicated tag:
For all the skills with you want on the page, optionally set to default which "should be read in full to properly use the page".And then add some javascript functions to wrap it / simplify required tokens.
Made a repo and a website if anyone is interested: https://webagentskills.dev/
The agent basically is living inside your running app with access to databases, endpoints etc. It's awesome.
1. https://tidewave.ai/
Will check this out to see if they’ve solved the token burn problem.
Instead of giving agents browser primitives like snapshot, click, fill, I wrapped websites into CLI commands. It connects via CDP to a managed Chrome where you're already logged in, then runs small JS functions that call the site's own internal APIs. No headless browser, no stolen cookies, no API keys. Your browser is already the best place for fetch to happen. It has all the cookies, sessions, auth state. Traditional crawlers spend so much effort on login flows, CSRF tokens, CAPTCHAs, anti-bot detection... all of that just disappears when you fetch from inside the browser itself. Frontend engineers would probably hate me for this because it's really hard to defend against.
So instead of snapshot the DOM (easily 50K+ tokens), find element, click, snapshot again, parse... you just run
and get structured JSON back.Here's the thing I keep thinking about though. Operating websites through raw CDP is a genuinely hard problem. A model needs to understand page structure, find the right elements, handle dynamic loading, deal with SPAs. That takes a SOTA model. But calling a CLI command? Any model can do that. So the SOTA model only needs to run once, to write the adapter. After that, even a small open-source model runs "bb-browser site reddit/hot" just fine.
And not everyone even needs to write adapters themselves. I created a community repo, bb-sites (https://github.com/epiral/bb-sites), where people freely contribute adapters for different websites. So in a sense, someone with just an open-source model can already feel the real impact of agents in their daily workflow. Agents shouldn't be a privilege only for people who can access SOTA models and afford the token costs.
There's a guide command baked in so if you do want to add a new site, you can tell your agent "turn this website into a CLI" and it reverse-engineers the site's APIs and writes the adapter.
v0.8.x dropped the Chrome extension entirely. Pure CDP, managed Chrome instance. "npm install -g bb-browser" and it works.
Once you start mapping interactions → network calls, a lot of UI complexity just disappears. It almost feels like the browser becomes a reverse-engineering tool for undocumented APIs.
That said, I do think there’s a tradeoff people don’t talk about enough:
- Sites change frequently, so these inferred APIs can be brittle - Auth/session handling gets messy fast - And of course, the ToS / ethical side is a gray area
Still, for personal automation or internal tooling, it’s insanely powerful. Way more efficient than driving full browser sessions for everything.
Curious how others are handling stability — are you just regenerating these mappings periodically, or building some abstraction layer on top?
It takes over your entire browser to center a div... and then fails to do so?
https://danielraffel.me/2026/03/16/my-oscar-2026-picks/
I know I could just use claude --chrome, but I’m used to this excellent MCP server.