My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.
i.e. agents for knowledge workers who are not software engineers
A few thoughts and questions:
1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.
2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.
3. How will startups in this space compete against labs who can train models to fit their products?
4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?
Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.
Edit: Notes on trying the new Codex update
1. The permissions workflow is very slick
2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.
3. It would be nice if the apps had quick ways to demo their new features. My workflow was to ask an LLM to read the update page and ask it what new things I could test, and then to take those things and ask Codex to demo them to me, but it doesn't quite understand it's own new features well enough to invoke them (without quite a bit of steering)
4. I cannot get it to show me the in app browser
5. Generating image mockups of websites and then building them is nice
I agree with the sentiment but I think for normie agents to take off in the way that you expect, you're going to have to grant them with full access. But, by granting agents full access, you immediately turn the computer into an extremely adversarial device insofar as txt files become credible threat vectors.
For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue. That hurts growth. I don't disagree with your general points, though.
> for normie agents to take off in the way that you expect, you're going to have to grant them with full access
At this point it's a foregone conclusion this is what users will choose. It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.
The threats are real, but it's just a product opportunity to these companies. OpenAI and friends will sell the poison (insecure computing) and the antidote (Mythos et all) and eat from both ends.
Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.
I don't want this, I just think it's going down that route.
There was a recent Stanford study which showed that AI enthusiasts and experts and the normies had very different sentiment when it came to AI.
I think most people are going to say they dont want it. I mean, why would anyone want a tool that can screw up their bank account? What benefit does it gain them?
Theres lots of cases of great highly useful LLM tools, but the moment they scale up you get slammed by the risks that stick out all along the long tail of outcomes.
I agree, in general we are going to find that ultimately most employee end users don't want it. Assuming it actually makes you more productive. I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase? You're just giving away that value to your employer.
On the other hand, entrepreneurs and managers are going to want it for their employees (and force it on them) for the above reason.
I want. If I get 10X more productive, I can unilaterally increase my compensation 10X by doing my stuff in 1 unit of time instead of 10 it took, and splitting the remaining 9 units of time into, say, 4 units of time doing more work, securing my position and setting myself up for promotion, and 5 units of time doing whatever the fuck I want. Not all compensation shows up in a bank account - working less, or under less stress, are also valuable.
Of course, such situation is only temporary - if I can suddenly be 10X productive, then so can everyone else, and then the baseline shifts so 10X is the new 1X.
You want it, but then you closed by explaining exactly why you shouldn't want it. Plus, the new baseline isn't neutral (as in, everyone is the same again). If humans can now do 10x the work as before, the employer doesn't need the same number of humans to carry out its work. So the new baseline is actually "let's keep 1 employee and fire the other 9", unless the business can find a way to suddenly expand 10x so that it needs 10x as much work done.
So the new baseline is actually "let's keep 1 employee and fire the other 9", unless the business can find a way to suddenly expand 10x so that it needs 10x as much work done.
If they have any surplus of money (or loans) they'll try, so those 9 employees may end up becoming team leads or middle management, trying to start new initiatives to get the 10x expansion (and 100x improvement).
The market isn't anywhere near efficient enough to directly translate productivity improvements into labor reductions. Thankfully, because everything that's nice and hopeful and human lives within the market inefficiency; a fully efficient market would be a hell worse than any writer or preacher ever imagined.
lol that has nothing to do with market efficiency.
I’ve seen a number of your posts where you talk about topics you clearly are not all that well versed in, with such confidence when you’re plain wrong.
Of course it does have to do with market efficiency, of which the inertia and surplus within companies (especially large ones) is a part.
> I’ve seen a number of your posts where you talk about topics you clearly are not all that well versed in, with such confidence when you’re plain wrong.
I'm sure it's true. However, since you brought it up, can you be more specific and name three?
Yes, but in the long run, the market expects growth and innovation, not just doing the same thing with fewer workers. Especially when every other company can just buy the exact same advantage for the same price.
Your first paragraph is so short sighted that its message didn't even make it beyond the next one. It's a race to the bottom and your "doing whatever the fuck I want" will obviously never materialize.
The typical work week today is 40 hours. Just like it was 80 years ago. The typical worker is dramatically more productive than 80 years ago yet "doing whatever the fuck I want" time has not increased. Why would it? Employers don't need to pay such that 20 hour work weeks give you the same income. Because everybody around you is ok with working 40 hours.
This won't be different with AI, no matter if the overall effect is 1.1x or 10x or 100x productivity. Because it's not a technological problem but a sociological one.
Good point. My rant assumed that "10x productivity" meant 10x output in 1x time, rather than 1x output in 0.1x time. Only one of those are actually objectionable.
> I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase? You're just giving away that value to your employer.
Those are productivity increases that got our standard of living to where it is. Fewer people doing the same amount of work has, historically speaking, freed people from their current job, allowing them to work on something else.
It's that analogy of the horse, they used to be farm animals. Now, fewer of them are 'employed' but they're much nicer jobs. I'm not sure if the same is true for us this time around though as new jobs being created have increasingly been highly skilled which means the majority can't apply.
If everyone becomes 10x more productive it won’t mean the companies cash flow 10x’s. Where value is loose there is competition, so in theory everyone should win. Unless nobody else can compete to capture that loose 10x value, in which case congratulations, you are now a unicorn.
Of course in reality in the short term what happens is companies lay off people to increase margins. Times will be tough for workers, and equity keeps gravitating towards those who already had it.
>Assuming it actually makes you more productive. I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase?
Given sane working arrangements or at minimum presence of remote work, it would be a bit shortsighted not to want to get done with your work in a tenth amount of time. In the very least, you're competing for a promotion against less effective people, all while having more time for yourself. If not, you're building labor market skillset in an efficient way so you can hop to a better employer.
> I think most people are going to say they dont want it. I mean, why would anyone want a tool that can screw up their bank account? What benefit does it gain them?
I'm not so sure. Matter of marketing and social pressure, big time.
Consider this: "Always-on pervasive google/fb/... login? I think most people are going to say they dont want it. I mean, why would anyone want a tool that would track their every move on the internet?" That could easily have been a statement 20 years ago. And look where we are.
Their solution will be to push mandatory and nonconsensual updates to your devices which limit your device and your freedom in the name of security. Like Google is doing to Android in September. You will no longer be able to install "unverified" software on anything. To address prompt injection attacks they're probably working on an approach where your data all has to be in the cloud and subject to security scans. That's already basically the model for Google Workspace, Google Drive and Chromebooks.
The model will get full access to your data, but in the name of security, you will only be permitted to have data that is cloud-hosted; local storage will effectively just be cache.
The era of the general computer will end, and the products you purchased from these companies will be nonconsensually altered and limited.
I'm so glad I switched to Linux more than a decade ago. At least on the PC there will still be an open source ecosystem for a long time to come, it may have less features but I'm willing to accept that.
Knowing that they can change what you bought overnight with a single nonconsensual update, think very, very carefully about who you purchase all of your future technology from. Google's upcoming nonconsensual degradation of Android should be a lesson for everybody.
>Google's upcoming nonconsensual degradation of Android should be a lesson for everybody.
Google is almost certainly doing this because the iOS was not found to be a monopoly, while Andorid was. It came up in Google's appeal of the Epic case verdict, where they directly asked the judge about it. Turns out you can't be anti-competitive if you don't have [allow] any competitors.
Nope. I'm still going to blame Google for their own actions. Nice try, though. I'm old enough to remember when Google pretended to take responsibility for not being evil. Even had it as their motto.
> I'm so glad I switched to Linux more than a decade ago. At least on the PC there will still be an open source ecosystem for a long time to come, it may have less features but I'm willing to accept that.
Wait until age verification is mandatory everywhere. :)
I can already see that happening, e. g. to access financial transactions or government apps, one needs to verify the id, and that will not work without age verification that can not be tampered with. So Linux will either submit to the same or be excluded.
(That free developers will be able to run Linux fine for much longer will also be true, but I guess they only care about catching the 95%, not the 5% linux users ... and 5% is a high guesstimate).
Edit: To clarify the above, one already had to provide personal data for financial transactions, of course, so a bank knows who is who, but the recent age verification go hand in hand with the attempt to get rid of vpn, and applications now make it a new standard to query the age of users, with the claim to "help protect kids". And some people buy into that rationale too. I don't, but I have seen many non-tech savvy people submit to that justification.
There's always the zero knowledge proof tech alternative, but I don't have the feeling we are moving in that direction - it's not the most profitable business is it.
> It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.
The concerning aspect is how others' content being scanned into systems don't have any knowledge or consent. Having private PII/files/code/emails/etc being read and/or accidentally shared by the agent online.
> Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.
Honestly, it's alright.
Just think of what we could do with computers up until this point.
We keep all those abilities.
And more, even, because the industry still keeps churning out new local LLMs.
So you even gain more capabilities than right now. Just not at the rate of the bleeding edge.
Which is just like the Linux desktop, essentially.
It's fine, really. There is no need to consume the bleeding edge. You will be fine.
Definitely agree here. Made the swap to Linux a little over a year ago and the only reason I even have nice hardware is because I like gaming. But if I was cut off from everything tomorrow, the decades of stuff I have that I have not played will keep me very happy lol
>Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.
As a proud neo-luddite, I'm watching the AI hype with grim amusement and I'll tell you hwhat, it doesn't look like a good time. Even putting to one side the planetary scale economic crash that is incoming, all the hypers seem to be on some sort of treadmill that is out of their control and it simply doesn't look like fun.
Everyone keeps saying how essential it all is yet a few years in and I still don’t see anything like the promised future of “everyone using them every day for everything.” Everyone’s just constantly talking (or stressing) about it.
We - including the companies - don’t know what the real “billion dollar application” of them is other than the unproven claim it makes everyone more productive in some general sense. When it doesn’t work people continue to say “it’s your fault not the tool’s.” Meanwhile investors are getting skittish and not one AI company is profitable yet. Companies that laid people off for LLM’s are regretting their decisions, leadership (and educators) is dealing with unvetted writing and having to waste their time cleaning it up, the list goes on. “Slop” is still a huge and growing problem.
LLM’s are here to stay, but IMO it’ll be more relevant in the long run than 3D printers yet less revolutionary than the internet. Everyone will touch them at various points but this whole-life, every-industry-disrupted integration still seems far fetched to me. Pricing is still a huge unsolved problem - everyone is still subsidized and despite gains in using fewer resources, it’s still too much to run these locally, even small models (not even getting into tooling and knowledge required to use them in a productive way).
When we zoom out and look at the whole picture, LLM’s have mostly made everyone’s online experience worse while the VC funded companies behind them are playing municipal and state governments’ for suckers a la Amazon getting so many cities to trip over each other giving away land and tax breaks, but far worse. Those are the biggest contributions so far aside from anecdotes from coders about “1000x productivity.” Again, I think they’re here to stay. But it’s called “AI hype” for a reason.
LLM’s have mostly been a problem creator IME rather than a “disruptor.” Never really seen “revolutionary technology” quite like it.
But hey, I’ll admit it’s useful to have a meh local model when I’m writing TTRPG stuff and have writer’s block. Though then I remember how it was trained, a whole other subject I haven’t even touched, so that kind of sucks too.
I dont see companies doing that. it can be business ending. only AI bros buying mac mini in 2026 to setup slop generated Claws would do that but a company doing that will for sure expose customer data.
> For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue.
Strongly agreed.
I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.
And the people who were using Cowork already were mostly blind approving all requests without reading what it was asking.
The more powerful, the more dangerous, and vice versa.
I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.
People have different levels of safety-consciousness, but also different tolerances and threat models.
For example, I would hesitate running a Mythos-level model in YOLO mode with full control over my computer, but right now, for personal stuff, even figuring out WTF are sandboxes in Claude Code / Gemini CLI, much less setting them up, is too much hassle. What's the worst it can do without me noticing? Format the drive and upload some private data into pastebin? Much as I hate cloud and the proliferation of 2FA in every service, that alone means it can't actually do more to me than waste few hours of my life, as I reimage my desktop and restore OneDrive (in case of destructive changes that got synced up). These models are not yet good enough to empty my bank account in few minutes I'm not looking; everything else they can do quickly is reversible or inconsequential.
Now, I do look at things closely when working with agentic AI tools. But my threat model is limited to worrying about those few hours of my life. rm -rf / --no-preserve-root is an annoyance, not a danger.
(I accept that different contexts give different threat modeling. I would be more worried if I were doing businessy business stuff with all kinds of secret sauces, or was processing PII of my employer's customers, or lived in a country where it's easy to have all your money stolen if your CC number or SSN gets posted online.)
How many of these threat vectors are just theoretical? Don’t use skills from random sources (just like don’t execute files from unknown sources). Don’t paste from untrusted sites (don’t click links on untrusted sites). Maybe there are fake documentation sites that the agent will search and have a prompt injected - but I haven’t heard of a single case where that happened. For now, the benefits outweigh the risk so much that I am willing to take it - and I think I have an almost complete knowledge of all the attack vectors.
Systems have been caught out that review pull requests, that’s a simple and clear one. The more obvious to me for most people is anything you do that interacts with your email without an explicit approve list of emails to read.
I cannot reconcile that growth for non-technical users is going to explode, when most utility from agents is via the ability to execute arbitrary code, generally in yolo mode, with the fact that almost all corporate IT departments do not give users the ability to install anything on their machine, let alone arbitrary code. Even developers at many companies are subject to this despite the productivity impacts.
The culture of corporate IT would need to change to allow it, and I just don't see it happening.
What about setting environments for normies that mitigate this problem? I don't know that you can do it on Windows, but Linux offers various tools for isolation where you can give full rights to an LLM and still be safe from certain classes of disaster.
Maybe this kind of isolation neuters the benefit you're thinking of, but I do believe some sort of solution could be reached.
There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up. I get that productivity can be improved with a lot of this for non developers, just not sure using 'code' as the term is the right one or not.
Lots of scepticism here, but I think this may really take off. After 25 years of heavy CLI use, lately I've found myself using codex (in terminal) for terminal tasks I've previously done using CLI commands.
If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.
Just reading the comments here it's amazing how many people seemingly don't know that Claude Desktop and Cowork basically already does all of this. Codex isn't pioneering these features, it's mostly just catching up.
I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something
Codex is my favorite UX for anything as it edits the files and I can use the proper tooling to adjust and test stuff, so in my experience it was already able to do everything. However lately the limits seem to have got extremely tight, I keep spending out the daily limits way too quickly. The weekly limits are also often spent out early so I switch to Claude or Gemini or something.
Tried it out. It's a far more reasonable UI than Claude Desktop at this moment. Anthropic has to catch up and finally properly merge the three tabs they have.
The killer feature of any of these assistants, if you're a manager, is asking to review your email, Slack, Notion, etc several times a day to highlight the items where you need to engage right away. Of course, if your company allows the connectors to do so.
Codex is pretty seamless right now and even after they cut on their 5-hr limits their $20 plan is still a little bit more generous.
I'd still say that Claude models are superior and just offer good opinionated defaults.
I've been using the Codex app for a while (a few months) for a few types of coding projects, and then slowly using it for random organizational/productivity things with local folders on my Mac. Most of that has been successful and very satisfying, however...
Codex is still far from ready for regular people. Simply moving a folder that Codex has been working on confuses the hell out of it. I can't figure out how to fix "Current working directory missing. This chat's working directory no longer exists". I've tried asking it to fix the problem and it tries lots of terminal commands and screws around with SQLite. Something this brittle is not for non-developers.
Prompt in the second video: "Reduce the font and tagline length"
Now we are using LLM just to adjust font size?
Also third video: "Generate an image for the hero section..."
I can't understand why OpenAI(or Google, or whatever AI companies) thinks it's okay to put an AI generated image for product description. It's literally fake.
Maybe I lack imagination, but I just can't figure out what I'd use this for. I'm finding AI helpful in writing code (especially verbose Unreal Engine C++ code) as a companion to my designs, but, I really don't want it using my computer. I dunno, I guess the other use case would be summarizing slack or discord but otherwise this seems to me like a solution in search of a problem.
Started using https://github.com/can1357/oh-my-pi this week and it makes every other tui coding assistant look like toy projects. It's has a nice UI yes, but the workflows it comes up with are incredible. They need to do a major overhaul in customisability for codex to come close to it.
All of you are ironically completely oblivious to the fact that you're training your own replacement by using these tools, you're even paying for it. Eventually, the companies you work for will just "hire" Anthropic or OpenAI agents in your place and you'll be out of job, no matter your seniority. Mark my words.
Has anyone figured out how to stop the Codex app from draining my M5 Pro's battery in like 2 hours? I can literally just have it open and my lap turns into a heater. I've tried adjusting all sorts of settings and haven't been able to make a dent. I'm assuming its the garbage renderer.
Side note: I really wish there was an expectation that TUI apps implemented accessibility APIs.
Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.
More like codex for nothing. I canceled my 20$ plan and won't let myself be bullied into buying more expensive plans to have the same limits I used to have a week ago on the 20$ plan. I would not be surprised if this illegal where I live.
I enabled the computer use plugin yesterday. Today I asked it to summarize a slack thread, along with a spreadsheet without thinking about it
I was expecting it to use MCPs I have for them, but they happened to not be authenticated for some reason
I got _really_ freaked out when a glowing cursor popped up while I was doing something else and started looking at slack and then navigating on chrome to the sheet to get the data it needs
Like on one hand it's really cool that it just "did the thing" but I was also freaked out during the experience
I’ve done a lot with Claude and OpenAI both, A LOT, but I’m still a little wary at letting it have too much access so I haven’t tried this feature in either of them.
Interesting that its restricted to macOS. I know programmers almost exclusively use macOS, but regular folk primarily use windows for work. I might be a bit biased as an engineer, but even outside of my circle, I mostly see windows being used. If they're serious about extending from coders to non technical business users, I would imagine they need to support windows.
Couple of people in my company have vibe coded some chat interface and they’re passing skills and MCPs that give the model access to all our internal data (multiple databases) and tools (Jira, Confluence etc).
I wonder if there’s something off the shelf that does this?
pretty much you have to build for humans as the "source" of truth and then have a robust agentic surface if you want to survive as a company. after using linear (for ex.) u can really see how it all fits together, i can be in cli, co-workers in slack, cowork, whatever and update tasks from anywhere). i refuse to use shit where i have to context switch by going into an app now. posthog is another good example of where it's going. the dirty detail now is that you HAVE to have the actual app so you can still manually look at data and do operations.
I'm sorry to be slightly off topic but since it's ChatGPT, anyone else find it annoying to read what the bot is thinking while it thinks? For some reason I don't want to see how the sausage is being made.
First use case I'm putting to work is testing web apps as a user. Although it seems like this could be a token burner. Saving and mostly replaying might be nice to have.
A simple mental model for Claude's new adaptive thinking is that it is the recommended way to use extended thinking. Adaptive Thinking (wraps Extended Thinking). It applies to Opus 4.7, 4.6, and Sonnet 4.6 and is the default mode on Claude Mythos Preview.
559 comments
i.e. agents for knowledge workers who are not software engineers
A few thoughts and questions:
1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.
2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.
3. How will startups in this space compete against labs who can train models to fit their products?
4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?
A few more thoughts collected here: https://chrisbarber.co/professional-agents/
Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.
Edit: Notes on trying the new Codex update
1. The permissions workflow is very slick
2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.
3. It would be nice if the apps had quick ways to demo their new features. My workflow was to ask an LLM to read the update page and ask it what new things I could test, and then to take those things and ask Codex to demo them to me, but it doesn't quite understand it's own new features well enough to invoke them (without quite a bit of steering)
4. I cannot get it to show me the in app browser
5. Generating image mockups of websites and then building them is nice
For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue. That hurts growth. I don't disagree with your general points, though.
> for normie agents to take off in the way that you expect, you're going to have to grant them with full access
At this point it's a foregone conclusion this is what users will choose. It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.
The threats are real, but it's just a product opportunity to these companies. OpenAI and friends will sell the poison (insecure computing) and the antidote (Mythos et all) and eat from both ends.
Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.
I don't want this, I just think it's going down that route.
I think most people are going to say they dont want it. I mean, why would anyone want a tool that can screw up their bank account? What benefit does it gain them?
Theres lots of cases of great highly useful LLM tools, but the moment they scale up you get slammed by the risks that stick out all along the long tail of outcomes.
On the other hand, entrepreneurs and managers are going to want it for their employees (and force it on them) for the above reason.
Of course, such situation is only temporary - if I can suddenly be 10X productive, then so can everyone else, and then the baseline shifts so 10X is the new 1X.
>
So the new baseline is actually "let's keep 1 employee and fire the other 9", unless the business can find a way to suddenly expand 10x so that it needs 10x as much work done.If they have any surplus of money (or loans) they'll try, so those 9 employees may end up becoming team leads or middle management, trying to start new initiatives to get the 10x expansion (and 100x improvement).
The market isn't anywhere near efficient enough to directly translate productivity improvements into labor reductions. Thankfully, because everything that's nice and hopeful and human lives within the market inefficiency; a fully efficient market would be a hell worse than any writer or preacher ever imagined.
I’ve seen a number of your posts where you talk about topics you clearly are not all that well versed in, with such confidence when you’re plain wrong.
> I’ve seen a number of your posts where you talk about topics you clearly are not all that well versed in, with such confidence when you’re plain wrong.
I'm sure it's true. However, since you brought it up, can you be more specific and name three?
The typical work week today is 40 hours. Just like it was 80 years ago. The typical worker is dramatically more productive than 80 years ago yet "doing whatever the fuck I want" time has not increased. Why would it? Employers don't need to pay such that 20 hour work weeks give you the same income. Because everybody around you is ok with working 40 hours.
This won't be different with AI, no matter if the overall effect is 1.1x or 10x or 100x productivity. Because it's not a technological problem but a sociological one.
> I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase? You're just giving away that value to your employer.
Those are productivity increases that got our standard of living to where it is. Fewer people doing the same amount of work has, historically speaking, freed people from their current job, allowing them to work on something else.
It's that analogy of the horse, they used to be farm animals. Now, fewer of them are 'employed' but they're much nicer jobs. I'm not sure if the same is true for us this time around though as new jobs being created have increasingly been highly skilled which means the majority can't apply.
Of course in reality in the short term what happens is companies lay off people to increase margins. Times will be tough for workers, and equity keeps gravitating towards those who already had it.
If you remove the effort from those tasks, they will have no value.
10x the value of 0 is 0
>Assuming it actually makes you more productive. I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase?
Given sane working arrangements or at minimum presence of remote work, it would be a bit shortsighted not to want to get done with your work in a tenth amount of time. In the very least, you're competing for a promotion against less effective people, all while having more time for yourself. If not, you're building labor market skillset in an efficient way so you can hop to a better employer.
I couldn't imagine thinking "I'm gonna do this 0.1x as fast as I could, wasting my life away with pointless extra work, to spite my employer"
> I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase?
The person who realizes that everybody around them is bow at 10X and if they don't follow suit then they will soon be out of a job.
> I think most people are going to say they dont want it. I mean, why would anyone want a tool that can screw up their bank account? What benefit does it gain them?
I'm not so sure. Matter of marketing and social pressure, big time.
Consider this: "Always-on pervasive google/fb/... login? I think most people are going to say they dont want it. I mean, why would anyone want a tool that would track their every move on the internet?" That could easily have been a statement 20 years ago. And look where we are.
The model will get full access to your data, but in the name of security, you will only be permitted to have data that is cloud-hosted; local storage will effectively just be cache.
The era of the general computer will end, and the products you purchased from these companies will be nonconsensually altered and limited.
I'm so glad I switched to Linux more than a decade ago. At least on the PC there will still be an open source ecosystem for a long time to come, it may have less features but I'm willing to accept that.
Knowing that they can change what you bought overnight with a single nonconsensual update, think very, very carefully about who you purchase all of your future technology from. Google's upcoming nonconsensual degradation of Android should be a lesson for everybody.
>Google's upcoming nonconsensual degradation of Android should be a lesson for everybody.
Google is almost certainly doing this because the iOS was not found to be a monopoly, while Andorid was. It came up in Google's appeal of the Epic case verdict, where they directly asked the judge about it. Turns out you can't be anti-competitive if you don't have [allow] any competitors.
> I'm so glad I switched to Linux more than a decade ago. At least on the PC there will still be an open source ecosystem for a long time to come, it may have less features but I'm willing to accept that.
Wait until age verification is mandatory everywhere. :)
I can already see that happening, e. g. to access financial transactions or government apps, one needs to verify the id, and that will not work without age verification that can not be tampered with. So Linux will either submit to the same or be excluded.
(That free developers will be able to run Linux fine for much longer will also be true, but I guess they only care about catching the 95%, not the 5% linux users ... and 5% is a high guesstimate).
Edit: To clarify the above, one already had to provide personal data for financial transactions, of course, so a bank knows who is who, but the recent age verification go hand in hand with the attempt to get rid of vpn, and applications now make it a new standard to query the age of users, with the claim to "help protect kids". And some people buy into that rationale too. I don't, but I have seen many non-tech savvy people submit to that justification.
The concerning aspect is how others' content being scanned into systems don't have any knowledge or consent. Having private PII/files/code/emails/etc being read and/or accidentally shared by the agent online.
> Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.
Honestly, it's alright.
Just think of what we could do with computers up until this point. We keep all those abilities.
And more, even, because the industry still keeps churning out new local LLMs. So you even gain more capabilities than right now. Just not at the rate of the bleeding edge.
Which is just like the Linux desktop, essentially. It's fine, really. There is no need to consume the bleeding edge. You will be fine.
>Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.
As a proud neo-luddite, I'm watching the AI hype with grim amusement and I'll tell you hwhat, it doesn't look like a good time. Even putting to one side the planetary scale economic crash that is incoming, all the hypers seem to be on some sort of treadmill that is out of their control and it simply doesn't look like fun.
We - including the companies - don’t know what the real “billion dollar application” of them is other than the unproven claim it makes everyone more productive in some general sense. When it doesn’t work people continue to say “it’s your fault not the tool’s.” Meanwhile investors are getting skittish and not one AI company is profitable yet. Companies that laid people off for LLM’s are regretting their decisions, leadership (and educators) is dealing with unvetted writing and having to waste their time cleaning it up, the list goes on. “Slop” is still a huge and growing problem.
LLM’s are here to stay, but IMO it’ll be more relevant in the long run than 3D printers yet less revolutionary than the internet. Everyone will touch them at various points but this whole-life, every-industry-disrupted integration still seems far fetched to me. Pricing is still a huge unsolved problem - everyone is still subsidized and despite gains in using fewer resources, it’s still too much to run these locally, even small models (not even getting into tooling and knowledge required to use them in a productive way).
When we zoom out and look at the whole picture, LLM’s have mostly made everyone’s online experience worse while the VC funded companies behind them are playing municipal and state governments’ for suckers a la Amazon getting so many cities to trip over each other giving away land and tax breaks, but far worse. Those are the biggest contributions so far aside from anecdotes from coders about “1000x productivity.” Again, I think they’re here to stay. But it’s called “AI hype” for a reason.
LLM’s have mostly been a problem creator IME rather than a “disruptor.” Never really seen “revolutionary technology” quite like it.
But hey, I’ll admit it’s useful to have a meh local model when I’m writing TTRPG stuff and have writer’s block. Though then I remember how it was trained, a whole other subject I haven’t even touched, so that kind of sucks too.
> For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue.
Strongly agreed.
I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.
And the people who were using Cowork already were mostly blind approving all requests without reading what it was asking.
The more powerful, the more dangerous, and vice versa.
>
I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.People have different levels of safety-consciousness, but also different tolerances and threat models.
For example, I would hesitate running a Mythos-level model in YOLO mode with full control over my computer, but right now, for personal stuff, even figuring out WTF are sandboxes in Claude Code / Gemini CLI, much less setting them up, is too much hassle. What's the worst it can do without me noticing? Format the drive and upload some private data into pastebin? Much as I hate cloud and the proliferation of 2FA in every service, that alone means it can't actually do more to me than waste few hours of my life, as I reimage my desktop and restore OneDrive (in case of destructive changes that got synced up). These models are not yet good enough to empty my bank account in few minutes I'm not looking; everything else they can do quickly is reversible or inconsequential.
Now, I do look at things closely when working with agentic AI tools. But my threat model is limited to worrying about those few hours of my life.
rm -rf / --no-preserve-rootis an annoyance, not a danger.(I accept that different contexts give different threat modeling. I would be more worried if I were doing businessy business stuff with all kinds of secret sauces, or was processing PII of my employer's customers, or lived in a country where it's easy to have all your money stolen if your CC number or SSN gets posted online.)
> I think I have an almost complete knowledge of all the attack vectors.
That's exactly the kind of hybris where the maximum danger lies.
The culture of corporate IT would need to change to allow it, and I just don't see it happening.
Maybe this kind of isolation neuters the benefit you're thinking of, but I do believe some sort of solution could be reached.
If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.
I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something
The killer feature of any of these assistants, if you're a manager, is asking to review your email, Slack, Notion, etc several times a day to highlight the items where you need to engage right away. Of course, if your company allows the connectors to do so.
Codex is pretty seamless right now and even after they cut on their 5-hr limits their $20 plan is still a little bit more generous.
I'd still say that Claude models are superior and just offer good opinionated defaults.
Codex is still far from ready for regular people. Simply moving a folder that Codex has been working on confuses the hell out of it. I can't figure out how to fix "Current working directory missing. This chat's working directory no longer exists". I've tried asking it to fix the problem and it tries lots of terminal commands and screws around with SQLite. Something this brittle is not for non-developers.
Now we are using LLM just to adjust font size?
Also third video: "Generate an image for the hero section..."
I can't understand why OpenAI(or Google, or whatever AI companies) thinks it's okay to put an AI generated image for product description. It's literally fake.
https://github.com/openai/codex/issues/2847
I'm still paranoid about keeping things securely sandboxed.
I think the latter is technically "Codex For Desktop", which is what this article is referring to.
>> for the more than 3 million developers who use it every week
It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.
Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.
I was expecting it to use MCPs I have for them, but they happened to not be authenticated for some reason
I got _really_ freaked out when a glowing cursor popped up while I was doing something else and started looking at slack and then navigating on chrome to the sheet to get the data it needs
Like on one hand it's really cool that it just "did the thing" but I was also freaked out during the experience
> Our mission is to ensure that AGI benefits all of humanity.
In order to do this we will eat everyone's lunch.
> Computer use is initially available on macOS,
Does anyone know of a good option that works on Wayland Linux?
I wonder if there’s something off the shelf that does this?
Bunch of startups need to pivot today after this announcement including mine
Ok. I upgrade.
"You've hit the message limit, upgrade to Plus for more".
Hmm. They've charged me. There's no meaningful support. I just got scammed, didn't I...
but there is no link, why would you not make this a link.
boggles my mind that companies make such little use of hypertext
> Codex can now operate your computer alongside you
I am getting some strange vibes here ... is AI actually also spying on these developers?
> ... work with more of the tools and apps you use everyday, generate images, remember your preferences ...
Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?
Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.