The threat is comfortable drift toward not understanding what you're doing (ergosphere.blog)

by zaikunzhang 630 comments 991 points
Read article View on HN

630 comments

[−] Wowfunhappy 40d ago

> Schwartz's experiment is the most revealing, and not for the reason he thinks. What he demonstrated is that Claude can, with detailed supervision, produce a technically rigorous physics paper. What he actually demonstrated, if you read carefully, is that the supervision is the physics. Claude produced a complete first draft in three days. It looked professional. The equations seemed right. The plots matched expectations. Then Schwartz read it, and it was wrong. Claude had been adjusting parameters to make plots match instead of finding actual errors. It faked results. It invented coefficients. [...] Schwartz caught all of this because he's been doing theoretical physics for decades. He knew what the answer should look like. He knew which cross-checks to demand. [...] If Schwartz had been Bob instead of Schwartz, the paper would have been wrong, and neither of them would have known.

And so the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.

Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob, even though Bob may seem to be faster.

The article gestures at this but I don't think it comes down hard enough. It doesn't seem practical. But we have to find a way, or we're all going to be in deep trouble when the next generation doesn't know how to evaluate what the LLMs produce!

---

† "Useful" in this context means "helps you produce good science that benefits humanity".

[−] conception 40d ago
Sadly I don’t see how our current social paradigm works for this. There is no history of any sort of long planning like this or long term loyalty (either direction) with employees and employers for this sort of journeyman guild style training. AI execs are basically racing, hoping we won’t need a Schwartz before they are all gone. But what incentives are in place to high a college grad, have them work without llms for a decade and then give them the tools to accelerate their work?
[−] Wowfunhappy 40d ago
Then the social paradigm needs to change. Is everyone just going to roll over and die while AI destroys academia (and possibly a lot more)?

Last September, Tyler Austin Harper published a piece for The Atlantic on how he thinks colleges should respond to AI. What he proposes is radical—but, if you've concluded that AI really is going to destroy everything these institutions stand for, I think you have to at least consider these sorts of measures. https://www.theatlantic.com/culture/archive/2025/09/ai-colle...

[−] pxc 40d ago
I was pretty interested until I got to this part:

> Another reason that a no-exceptions policy is important: If students with disabilities are permitted to use laptops and AI, a significant percentage of other students will most likely find a way to get the same allowances, rendering the ban useless. I witnessed this time and again when I was a professor—students without disabilities finding ways to use disability accommodations for their own benefit. Professors I know who are still in the classroom have told me that this remains a serious problem.

This would be a huge problem for students with severe and uncorrectable visual impairments. People with degenerative eye diseases already have to relearn how to do every single thing in their life over and over and over. What works for them today will inevitably fail, and they have to start over.

But physical impairments like this are also difficult to fake and easy to discern accurately. It's already the case that disability services at many universities only grants you accommodations that have something to do with your actual condition.

There are also some things that are just difficult to accommodate without technology. For instance, my sister physically cannot read paper. Paper is not capable of contrast ratios that work for her. The only things she can even sometimes read are OLED screens in dark mode, with absolutely black backgrounds; she requires an extremely high contrast ratio. She doesn't know braille (which most blind people don't, these days) because she was not blind as a little girl.

Committed cheaters will be able to cheat anyway; contemporary AI is great at OCR. You'll successfully punish honest disabled people with a policy like this but you won't stop serious cheaters.

[−] greazy 40d ago
The author did not outright suggest the banning of all technology. They even linked to a digital typewriter. After the very paragraph you quote, they suggest instead to offer a more human centric approach to helping disabled people. It's not a huge leap to suggest that your sister could continue to learn with the above two solutions; a disability tutor combined with a OLED screen.
[−] bsder 40d ago
What he is referring to are perfectly good students whose parents will go shopping for a medical diagnosis so that their child can get "accommodation" like extra time to complete tests.

The problem is that this is treating the symptom rather than the cause. The symptom is that cheating for college admission and achievement is too effective. The cause is that college admission and achievement has become high stakes, and it absolutely should not be.

[−] Wowfunhappy 40d ago
You don't have to agree with his precise solution, and in fact I'm not sure whether I do. However, I found the article useful because it got me thinking about the universe of things we could be considering, if we really do think AI is poised to destroy education as we know it.
[−] irishcoffee 40d ago
I don’t know about anyone else here, but college was not educating because I was at college. I did all of the reading and studying on my own. The classes weren’t very interesting, most of my TAs didn’t speak the native language well at all, nor did half the professors.

I enjoyed my time, I made a lot of lifelong friends, and figured out how to live on my own. My buddies that enrolled in boot camp instead of college learned all those same skills, for free.

Education won’t be ruined or blemished my LLMs, the whole thing was a joke to begin with. The bit that ruined college was unlimited student loans… and all of our best and brightest folks running the colleges raping students for money. It’s pathetic, evil, and somehow espoused.

[−] satvikpendem 39d ago
Sounds like you just had terrible professors because most of mine were good and we learned quite a bit in classes, at least I did. I distinctly remember one professor who, every class, would meander discussion over many topics then find a way to bring them all together at the very end, crystalizing all of these disparate thoughts into one cohesive theory. And he did that every single class that semester. It was a marvel to behold.
[−] irishcoffee 39d ago
I remember my calc teachers, married, last name gulick, university of maryland. The calc book was sold as the same book for calc 1/2/3. The couple, gulick were the authors. Every semester they released a new edition, the only thing that changed was the problem set numbers. So, if you took calc 1/2/3, you spent $200/semester for the same fucking book.

Magical times.

[−] senordevnyc 40d ago
Yeah, this proposal is likely straight up illegal.
[−] aduty 40d ago

> Then the social paradigm needs to change. Is everyone just going to roll over and die while AI destroys academia (and possibly a lot more)?

My 40-some-odd years on this planet tells me the answer is yes.

[−] pasquinelli 40d ago
you might be right but 40-some-odd years is a tiny amount of time.
[−] aduty 39d ago
Sure, relative to the universe, but it's more than those who haven't had idealism beaten out of them by the world.
[−] pasquinelli 36d ago
no, not relative to the universe, relative to recent history. 40 years is not much in the context of history, which is the topic. your life experience is small and unimportant at the world-historical scale.
[−] aduty 35d ago
So what are you, 200?
[−] mrob 40d ago

>What he proposes is radical

It sounds entirely reasonable and moderate to me.

[−] throwaway27448 40d ago
Also entirely ineffective. Banning individual behavior won't prevent collective dysfunction and will only harm honest actors. The only answer that makes sense is reforming the acceptance of work to be resistant to the inevitable (ab)use of AI.
[−] senordevnyc 40d ago
It's neither reasonable nor moderate, which is why it'll never happen.
[−] conception 40d ago
Well, we are already rolling over and dying (literally) on everything from vaccine denial to climate change. So, yes, we are. Obviously yes.
[−] senordevnyc 40d ago
Article is paywalled, so perhaps you could just summarize his proposal?
[−] jayd16 40d ago
Some folks need to touch the hot stove before they learn but eventually they learn.

If AI output remains unreliable then eventually enough companies will be burned and management will reinstate proper oversight. All while continuing to pay themselves on the back.

[−] FrojoS 40d ago

> There is no history of any sort of long planning

Sure there is. Its the formal education system that produced the college grad.

[−] joe_the_user 40d ago
Well, the astrophysics situation is special because, as the article notes, there aren't breakthroughs that can be externally verified.

Other projects' success will be proportional to their number of Schwartz' and so it seems unlikely they disappear. But they may disappear for areas in which there is no immediate money.

[−] TrnsltLife 39d ago
I guess that Anathem novel was prescient. We need academic monasteries where people go in and have no access to tech for a period of time. At least no LLMs/AI. They think with their heads and their chalkboard and their sand pits like the geometers of old.
[−] cmiles74 40d ago
I think we already know what we need to do: encourage people to do the work themselves, discourage beginners from immediately asking an LLM for help and re-introducing some kind of oral exam. As the article mentions, banning LLMs is impractical and what we really need are people who can tell when the LLM is confidently wrong; not people who don't know how to work with an LLM.

I hope it will encourage people to think more about what they get out of the work, what doing the work does for them; I think that's a good thing.

[−] throw310822 40d ago

> the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.

That you can't "become Schwartz" by using LLMs is an unproven assumption. Actually, it's a contradiction in the logic of the essay: if Bob managed to produce a valid output by using an LLM at all, then it means that he must have acquired precisely that supervision ability that the essay claims to be necessary.

Btw, note that in the thought experiment Bob isn't just delegating all the work to the LLM. He makes it summarise articles, extract important knowledge and clarify concepts. This is part of a process of learning, not being a passive consumer.

[−] doug_durham 40d ago
The article is a thought experiment. The author hypothesizes that Bob isn't getting the same benefit that Alice is getting. That hypothesis could be wrong. I don't know and the author doesn't know. It could be that Bob is going to have a very successful career and will deeply know the field because he is able to traverse a wider set of problems more quickly. At this point, it's just hypothesis. I don't think that we can say we need more Alices any more than we can say we need more Bobs. Unfortunately we will have to wait and see. It will be upon the academic community to do the work to enforce quality controls. That is probably the weakness to worry about.
[−] mezyt 40d ago
Profession (1957) by Isaac Asimov is relevant: https://news.ycombinator.com/item?id=46664195
[−] mojuba 40d ago

> Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob

The solution is relatively simple though - not sure the article suggests this as I only skimmed through:

Being good in your field doesn't only mean pushing articles but also being able to talk about them. I think academia should drift away from written form toward more spoken form, i.e. conferences.

What if, say, you can only publish something after presenting your work in person, answer questions, etc? The audience can be big or small, doesn't matter.

It would make publishing anything at all more expensive but maybe that's exactly what academia needs even irrespective of this AI craze?

[−] vinceguidry 40d ago
I've been using ChatGPT to re-bootstrap my coding hobby. After the initial honeymoon wore off, I realized I was staring down the barrel of a dilemma. If I use AI to "just handle" the parts of the system I don't want to understand, I invariably end up in a situation where I gotta throw a whole bunch of work out. But I can't supervise without an understanding of what it's supposed to be doing, and if I knew what it was supposed to be doing, I could just do it myself.

So I settled on very incremental work. It's annoying cutting and pasting code blocks into the web interface while I'm working on my interface to Neovim, spent a whole day realizing I can't trust it to instrument neovim and don't want to learn enough lua to manage it. (I moved onto neovim from Emacs because I don't like elisp and gpt is even worse at working on my emacs setup than neovim, the end goal is my own editor in ruby but gpt damn sure can't understand that atm) But at least I'm pushing a real flywheel and not the brooms from Fantasia.

[−] nunez 40d ago

> Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob, even though Bob may seem to be faster.

If you are a massive company that owns all of the knowledge and all of the technology needed to apply that knowledge, then you don't need Alice. You don't _want_ Alice. You want more Bobs. It looks better on the books.

Tale as old as time.

[−] fomoz 40d ago
AI is an accelerant, not a replacement for skill. At least, not yet.

I built a full stack app in Python+typescript where AI agents process 10k+ near-real-time decisions and executions per day.

I have never done full stack development and I would not have been able to do it without GitHub Copilot, but I have worked in IT (data) for 15 years including 6 in leadership. I have built many systems and teams from scratch, set up processes to ensure accuracy and minimize mistakes, and so on.

I have learned a ton about full stack development by asking the coding agent questions about the app, bouncing ideas off of it, planning together, and so on.

So yes, you need to have an idea of what you're doing if you want to build anything bigger than a cheap one shot throwaway project that sort of works, but brings no value and nobody is actually gonna use.

This is how it is right now, but at the same time AI coding agents have come an incredibly long way since 2022! I do think they will improve but it can't exactly know what you want to build. It's making an educated guess. An approximation of what you're asking it to do. You ask the same thing twice and it will have two slightly different results (assuming it's a big one shot).

This is the fundamental reality of LLMs, sort of like having a human walking (where we were before AI), a human using a car to get to places (where we are now) and FSD (this is future, look how long this took compared to the first cars).

[−] einszwei 40d ago

> And so the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.

I have gained a lot of benefit using LLMs in conjunction with textbooks for studying. So, I think LLMs could help you become Schwartz.

[−] sd9 40d ago
The thing is, agents aren’t going away. So if Bob can do things with agents, he can do things.

I mourn the loss of working on intellectually stimulating programming problems, but that’s a part of my job that’s fading. I need to decide if the remaining work - understanding requirements, managing teams, what have you - is still enjoyable enough to continue.

To be honest, I’m looking at leaving software because the job has turned into a different sort of thing than what I signed up for.

So I think this article is partly right, Bob is not learning those skills which we used to require. But I think the market is going to stop valuing those skills, so it’s not really a _problem_, except for Bob’s own intellectual loss.

I don’t like it, but I’m trying to face up to it.

[−] DavidPiper 40d ago
I've just started a new role as a senior SWE after 5 months off. I've been using Claude a bit in my time off; it works really well. But now that I've started using it professionally, I keep running into a specific problem: I have nothing to hold onto in my own mind.

How this plays out:

I use Claude to write some moderately complex code and raise a PR. Someone asks me to change something. I look at the review and think, yeah, that makes sense, I missed that and Claude missed that. The code works, but it's not quite right. I'll make some changes.

Except I can't.

For me, it turns out having decisions made for you and fed to you is not the same as making the decisions and moving the code from your brain to your hands yourself. Certainly every decision made was fine: I reviewed Claude's output, got it to ask questions, answered them, and it got everything right. I reviewed its code before I raised the PR. Everything looked fine within the bounds of my knowledge, and this review was simply something I didn't know about.

But I didn't make any of those decisions. And when I have to come back to the code to make updates - perhaps tomorrow - I have nothing to grab onto in my mind. Nothing is in my own mental cache. I know what decisions were made, but I merely checked them, I didn't decide them. I know where the code was written, but I merely verified it, I didn't write it.

And so I suffer an immediate and extreme slow-down, basically re-doing all of Claude's work in my mind to reach a point where I can make manual changes correctly.

But wait, I could just use Claude for this! But for now I don't, because I've seen this before. Just a few moments ago. Using Claude has just made it significantly slower when I need to use my own knowledge and skills.

I'm still figuring out whether this problem is transient (because this is a brand new system that I don't have years of experience with), or whether it will actually be a hard blocker to me using Claude long-term. Assuming I want to be at my new workplace for many years and be successful, it will cost me a lot in time and knowledge to NOT build the castle in the sky myself.

[−] AlexWilkins12 40d ago
Ironically, this article reeks of AI-generated phrases. Lot's of "It's not X, it's Y". eg: - "The failure mode isn't malice. It's convenience", - "You haven't saved time. You've forfeited the experience that the time was supposed to give you." - "But the real threat isn't either of those things. It's quieter, and more boring, and therefore more dangerous. The real threat is a slow, comfortable drift toward not understanding what you're doing. Not a dramatic collapse. Not Skynet. Just a generation of researchers who can produce results but can't produce understanding."

And indeed running it through a few AI text detectors, like Pangram (not perfect, by any means, but a useful approximation), returns high probabilities.

It would have felt more honest if the author had included a disclaimer that it was at least part written with AI, especially given its length and subject matter.

[−] caxap 40d ago
If this article was written a year ago, I would have agreed. But knowing what I know today, I highly doubt that the outcomes of LLM/non-LLM users will be anywhere close to similar.

LLMs are exceptionally good at building prototypes. If the professor needs a month, Bob will be done with the basic prototype of that paper by lunch on the same day, and try out dozens of hypotheses by the end of the day. He will not be chasing some error for two weeks, the LLM will very likely figure it out in matter of minutes, or not make it in the first place. Instructing it to validate intermediate results and to profile along the way can do magic.

The article is correct that Bob will not have understood anything, but if he wants to, he can spend the rest of the year trying to understand what the LLM has built for him, after verifying that the approach actually works in the first couple of weeks already. Even better, he can ask the LLM to train him to do the same if he wishes. Learn why things work the way they do, why something doesn't converge, etc.

Assuming that Bob is willing to do all that, he will progress way faster than Alice. LLMs won't take anything away if you are still willing to take the time to understand what it's actually building and why things are done that way.

5 years from now, Alice will be using LLMs just like Bob, or without a job if she refuses to, because the place will be full of Bobs, with or without understanding.

[−] oncallthrow 40d ago
I think this article is largely, or at least directionally, correct.

I'd draw a comparison to high-level languages and language frameworks. Yes, 99% of the time, if I'm building a web frontend, I can live in React world and not think about anything that is going on under the hood. But, there is 1% of the time where something goes wrong, and I need to understand what is happening underneath the abstraction.

Similarly, I now produce 99% of my code using an agent. However, I still feel the need to thoroughly understand the code, in order to be able to catch the 1% of cases where it introduces a bug or does something suboptimally.

It's possible that in future, LLMs will get _so_ good that I don't feel the need to do this, in the same way that I don't think about the transistors my code is ultimately running on. When doing straightforward coding tasks, I think they're already there, but I think they aren't quite at that point when it comes to large distributed systems.

[−] steveBK123 40d ago
For the people arguing that the output is the code and the faster we generate it the better..

I do wonder where all the novel products produced by 10x devs who are now 100x with LLMs, the “idea guys” who can now produce products from whole clothe without having to hire pesky engineers.. where is the one-man 10 billion dollar startups, etc? We are 3-4 years into this mania and all I see on the other end of it is the LLMs themselves.

Why hasn’t anything gotten better?

[−] theteapot 40d ago
I have a vaguely unrelated question re:

> You do what your supervisor did for you, years ago: you give each of them a well-defined project. Something you know is solvable, because other people have solved adjacent versions of it. Something that would take you, personally, about a month or two. You expect it to take each student about a year ...

Is that how PhD projects are supposed to work? The supervisor is a subject matter expert and comes up with a well-defined achievable project for the student?

[−] CharlieDigital 40d ago
I recently saw a preserved letterpress printing press in person and couldn't help but think of the parallels to the current shift in software engineering. The letterpress allowed for the mass production of printed copies, exchanging the intensive human labor of manual copying to letter setting on the printing press.

Yet what did not change in this process is that it only made the production of the text more efficient; the act of writing, constructing a compelling narrative plot, and telling a story were not changed by this revolution.

Bad writers are still bad writers, good writers still have a superior understanding of how to construct a plot. The technological ability to produce text faster never really changed what we consider "good" and "bad" in terms of written literature; it just allow more people to produce it.

It is hard to tell if large language models can ever reach a state where it will have "good taste" (I suspect not). It will always reflect the taste and skill of the operator to some extent. Just because it allows you to produce more code faster does not mean it allows you to create a better product or better code. You still need to have good taste to create the structure of the product or codebase; you still have to understand the limitations of one architectural decision over another when the output is operationalized and run in production.

The AI industry is a lot of hype right now because they need you to believe that this is no longer relevant. That Garry Tan producing 37,000 LoC/day somehow equates to producing value. That a swarm of agents can produce a useful browser or kernel compiler.

Yet if you just peek behind the curtains at the Claude Code repo and see the pile of unresolved issues, regressions, missing features, half-baked features, and so on -- it seems plainly obvious that there are limitations because if Anthropic, with functionally unlimited tokens with frontier models, cannot use them to triage and fix their own product.

AI and coding agents are like the printing press in some ways. Yes, it takes some costs out of a labor intensive production process, but that doesn't mean that what is produced is of any value if the creator on the other end doesn't understand the structure of the plot and the underlying mechanics (be it of storytelling or system architecture).

[−] matheusmoreira 40d ago
I dunno. Claude helped me implement a new memory allocator, compacting garbage collector and object heap for my programming language. I certainly understood what I was doing when I did this. The experience was extremely engaging for me. Claude taught me a lot.

I think the real danger is no longer caring about what you're doing. Yesterday I just pointed Claude at my static site generator and told it to clean it up. I wanted to care but... I didn't.

[−] mkovach 40d ago
This isn't new. It's been the same problem for decades, not what gets built, but what gets accepted.

Weak ownership, unclear direction, and "sure, I guess" reviews were survivable when output was slow. When changes came in one at a time, you could get away with not really deciding.

AI doesn't introduce a new failure mode. It puts pressure on the old one. The trickle becomes a firehose, and suddenly every gap is visible. Nobody quite owns the decision. Standards exist somewhere between tribal memory, wishful thinking, and coffee. And the question of whether something actually belongs gets deferred just long enough to merge it, but forces the answer without input.

The teams doing well with agentic workflows aren't typically using magic models. They've just done the uncomfortable work of deciding what they're building, how decisions are made, and who has the authority to say no.

AI is fine, it just removed another excuse for not having our act together. While we certainly can side-eye AI because of it, we own the problems. Well, not me. The other guy who quit before I started.

[−] stavros 40d ago
I see this fallacy being committed a lot these days. "Because LLMs, you will no longer need a skill you don't need any more, but which you used to need, and handwaves that's bad".

Academia doesn't want to produce astrophysics (or any field) scientists just so the people who became scientists can feel warm and fuzzy inside when looking at the stars, it wants to produce scientists who can produce useful results. Bob produced a useful result with the help of an agent, and learned how to do that, so Bob had, for all intents and purposes, the exact same output as Alice.

Well, unless you're saying that astrophysics as a field literally does not matter at all, no matter what results it produces, in which case, why are we bothering with it at all?

[−] FrojoS 40d ago
Every PhD program I'm aware of has a final hurdle known as the defence. You have to present your thesis while standing in front of a committee, and often the local community and public. They will asks questions and too many "I don't know" or false answers would make you fail. So, there is already a system in place that should stop Bob from graduating if he indeed learned much less than Alice. A similar argument can be made for conference publications. If Bob publishes his first year project at a conference but doesn't actually understand "his own work" it will show.

The difficulty of passing the defence vary's wildly between Universities, departments and committees. Some are very serious affairs with a decent chance of failure while others are more of a show event for friends and family. Mine was more of the latter, but I doubt I would have passed that day if I had spend the previous years prompting instead of doing the grunt work.

[−] cbushko 40d ago
This article makes the assumption that Bob was doing absolutely nothing, maybe at the Pub with this friends, while the AI did all his work.

How do we know that while the AI was writing python scripts that Bob wasn't reading more papers, getting more data and just overall doing more than Alice.

Maybe Bob is terrible at debugging python scripts while Alice is a pro at it?

Maybe Bob used his time to develop different skills that Alice couldn't dream of?

Maybe Bob will discover new techniques or ideas because he didn't follow the traditional research path that the established Researchers insist you follow?

Maybe Bob used the AI to learn even more because he had a customized tutor at his disposal?

Or maybe Bob just spent more time at the Pub with his friends.

[−] alestainer 40d ago
I was in academia in the pre-GPT-3 era and I don't see a difference between the superficial pass-the-criteria understanding of things then and now. People already rely on a ton of sources, putting their faith into it, recent replication crisis in social sciences had nothing to do with any LLMs. The problem of academia lies in the first paragraph of this article - supervisor that has to choose doing incremental, clearly feasible stuff. Currently it's called science, but I like to call it knowledge engineering because you're pretty much following a recipe and there is a clear bound on returns to such activities.
[−] beedeebeedee 40d ago
I don’t have kids, but suggested something years ago to my siblings when they started confronting similar issues: we should do a version of “ontogeny recapitulates phylogeny” for personal computers.

Kids should start off with Commodore 64s, then get late 80’s or early 90’s Mac’s, then Windows 95, Debian and internet access (but only html). Finally, when they’re 18, be allowed an iPhone, Android and modern computing.

Parenting can’t prevent the use of LLMs in grad school, but a similar approach could be taken by grad departments: don’t allow LLMs for the first few years, and require pen and paper exams, as well as oral examinations for all research papers.

[−] turtletontine 40d ago

> Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.

I don’t believe this. Totally plausible that someone would be able to produce passable work with LLMs at a similar pace to a curious and talented scientist. But if you, their advisor, are sitting down and talking with them every week? It’s obvious how much they care or understand, I can’t believe you wouldn’t be able to tell the difference between these students.

[−] throwaway132448 40d ago
The flip side I don’t see mentioned very often is that having a product where you know how the code works becomes its own competitive advantage. Better reliability, faster fixes and iteration, deeper and broader capabilities that allow you to be disruptive while everything else is being built towards the mean, etc etc. Maybe we’ve not been in this new age for long enough for that to be reflected in people’s purchasing criteria, but I’m quite looking forward to fending off AI-built competitors with this edge.
[−] lxgr 40d ago

> for someone who doesn't yet have that intuition, the grunt work is the work

Very well said. I think people are about to realize how incredibly fortunate and exceptional it is to actually get paid, and in our industry very well, through a significant fraction of one's career while still "just" doing the grunt work, that arguably benefits the person doing it at least as much as the employer.

A stable paid demand for "first-year grad student level work" or the equivalent for a given industry is probably not the only possible way to maintain a steady supply of experts (there's always the option of immense amounts of student debt or public funding, after all), but it sure seems like a load-bearing one in so many industries and professions.

At the very least, such work being directly paid has the immense advantage of making artificially (often without any bad intentions!) created bullshit tasks that don't exercise actually relevant skillsets, or exercise the wrong ones, much easier to spot.

[−] somesortofthing 40d ago
I used to feel this way but... honestly, I've found that pressing on with only a vague understanding of what's happening and then diving deep with the agent's own help if it keeps making bad decisions leads to more output of comparable quality. Even without a deep understanding of the topic, you can usually tell when the LLM is BSing and you need to intervene. The model has much more knowledge "present-at-hand" than it'll actually apply to a given implementation, so you can substantially deepen your understanding with minimal reference to external resources by just taking a break from implementation to have a convo with it.

I'm sure this approach breaks down at the very frontiers of highly technical fields but... virtually all work, even work by educated professionals, happens outside that area anyway. On well-trodden ground, you can improve at supervising agents by doing things that test your ability to supervise agents.

[−] cadamsdotcom 37d ago

> Claude produced a complete first draft in three days. It looked professional. The equations seemed right. The plots matched expectations. Then Schwartz read it, and it was wrong. Claude had been adjusting parameters to make plots match instead of finding actual errors. It faked results. It invented coefficients. It produced verification documents that verified nothing. It asserted results without derivation. It simplified formulas based on patterns from other problems instead of working through the specifics of the problem at hand.

This is solvable with harness engineering.

The model’s first try is never ready for human consumption. There needs to be automation (bespoke, a mix of code and prompt based hooks - which agents can build) to force the agent’s output back through itself to tell it to be more rigorous, search online for proof of its claims, etc etc. and not stop until every claim is verifiable.

No human should see the model’s output until it’s met these (again bespoke but not hand written) guardrails.

What I’m talking about doesn’t exist and really has no analogy yet, so you can think of it as a super advanced form of linting. It’s grounding, but also verification that the grounding links to the material, and refusal to accept the model’s work until it meets the bar.

We are asking models to dream (invent purely from their weights), and are surprised when their dreams, just like ours, have little relationship to reality. The current state of the art is going to look very naive in a few years’ time.

[−] katzgrau 40d ago
When you’re deep in a thoughtful read and suddenly get the eerie feeling that you’re being catfished

> But the real threat isn't either of those things. It's quieter, and more boring, and therefore more dangerous. The real threat is a slow, comfortable drift toward not understanding what you're doing. Not a dramatic collapse. Not Skynet. Just a generation of researchers who can produce results but can't produce understanding. Who know what buttons to press but not why those buttons exist. Who can get a paper through peer review but can't sit in a room with a colleague and explain, from the ground up, why the third term in their expansion has the sign that it does.

[−] steveBK123 40d ago
I agree with the general premise - the risk is we don’t develop juniors (new Alices) anymore, and at some point people are just sloperators gluing together bits of LLM output they do not understand.

I have seen versions of this in the wild where a firm has gone through hard times and internally systems have lost all their original authors, and every subsequent generation of maintainers… being left with people in awe of the machine that hasn’t been maintained in a decade.

I interviewed a guy once that genuinely was proud of himself, volunteering the information to me as he described resolving a segfault in a live trading system by putting kill -9 in a cronjob. Ghastly.

[−] omega3 40d ago
I wonder what effect AI had on online education - course signups, new resources being added etc.

I’ve recently started csprimer and whilst mentally stimulating I wonder if I’m not completely wasting my time.

[−] patcon 40d ago
The exciting and interesting to me is that we'll probably need to engage "chaos engineering" principles, and encode intentional fallibility into these agents to keep us (and them) as good collaborators, and specifically on our toes, to help all minds stay alert and plastic

If that comes to pass, we'll be rediscovering the same principles that biological evolution stumbled upon: the benefits of the imperfect "branch" or "successive limited comparison" approach of agentic behaviour, which perhaps favours heuristics (that clearly sometimes fail), interaction between imperfect collaborators with non-overlapping biases, etc etc

https://contraptions.venkateshrao.com/p/massed-muddler-intel...

> Lindblom’s paper identifies two patterns of agentic behavior, “root” (or rational-comprehensive) and “branch” (or successive limited comparisons), and argues that in complicated messy circumstances requiring coordinated action at scale, the way actually effective humans operate is the branch method, which looks like “muddling through” but gradually gets there, where the root method fails entirely.

[−] visarga 40d ago

> Whether that student walks out the door five years later as an independent thinker or a competent prompt engineer is, institutionally speaking, irrelevant.

I think this is a simplification, of course Bob relied on AI but they also used their own brain to think about the problem. Bob is not reducible to "a competent prompt engineer", if you think that just take any person who prompts unrelated to physics and ask them to do Bob's work.

In fact Bob might have a change to cover more mileage on the higher level of work while Alice does the same on the lower level. Which is better? It depends on how AI will evolve.

The article assumes the alternative to AI-assisted work is careful human work. I am not sure careful human work is all that good, or that it will scale well in the future. Better to rely on AI on top of careful human work.

My objection comes from remembering how senior devs review PRs ... "LGTM" .. it's pure vibes. If you are to seriously review a PR you have to run it, test it, check its edge cases, eval its performance - more work than making the PR itself. The entire history of software is littered with bugs that sailed through review because review is performative most of the time.

Anyone remember the verification crisis in science?

[−] ahussain 40d ago

> When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's.

In my experience, doing these things with the right intentions can actually improve understanding faster than not using them. When studying physics I would sometimes get stuck on small details - e.g. what algebraic rule was used to get from Eq 2.1 to 2.2? what happens if this was d^2 instead of d^3 etc. Textbooks don't have space to answer all these small questions, but LLMs can, and help the student continue making progress.

Also, it seems hard to imagine that Alice and Bob's weekly updates would be indistinguishable if Bob didn't actually understand what he was working on.

[−] sam_lowry_ 40d ago
See also The Profession by Isaac Asimov [0] and his small story The Feeling of Power [1]. Both are social dramas about societies that went far down the path of ignorance.

[0] http://employees.oneonta.edu/blechmjb/JBpages/m360/Professio...

[1] https://s3.us-west-1.wasabisys.com/luminist/EB/A/Asimov%20-%...

[−] toniantunovi 40d ago
The coding-specific version of this is worth naming precisely. The drift does not happen because you stop writing code. It happens because you stop reading the output carefully. With AI-generated code, there is a particular failure mode: the code is plausible enough to pass a quick review and tests pass, so you ship it. The understanding degradation is cumulative and invisible until it is not. The partial fix is making automated checks independent of the developer's attention level: type checking, SAST, dependency analysis, and coverage gates that run regardless of how carefully you reviewed the diff. These are not a substitute for understanding, but they create a floor below which "comfortable drift" cannot silently carry you. The question worth asking of any AI coding workflow is whether that floor exists and where it is.
[−] pwr1 40d ago
I catch myself doing this more than I'd like to admit. Copy something from an LLM, it works, ship it, move on. Then a week later something breaks and I realize I have no idea what that code actually does! The speed is addicting but your slowly trading depth for velocity and at some point that bill comes due.
[−] boxomcfoxo 32d ago
This essay was extruded in its entirety from Claude.

https://boxobarks.leaflet.pub/3mj42airv3s2o#fingerprints-of-...

[−] inatreecrown2 40d ago
Using AI to solve a task does not give you experience in solving the task, it gives you experience in using AI.
[−] __MatrixMan__ 40d ago
But aren't you still going to have to convince other people to let you do it with their money/data/hardware/etc? The understanding necessary to make that argument well is pretty deep and is unaffected by AI.

I've been having a lot of fun vibe coding little interactive data visualizations so when I present the feature to stakeholders they can fiddle with it and really understand how it relates to existing data. I saw the agent leave a comment regarding Cramer's rule and yeah its a bit unsettling that I forgot what that is and haven't bothered to look it up, but I can tell from the graphs that its doing the correct thing.

There's now a larger gap between me and the code, but the chasm between me and the stakeholders is getting smaller and so far that feels like an improvement.

[−] darkstarsys 39d ago
I agree with this article, and I think it goes further than astrophysics or even physics. As agentic LLMs are starting to prove long-standing open mathematical conjectures and even invent new ones, I fear we may reach a point in mathematical research (which, as Hogg's article describes astrophysics, has no "right edge") where the machines are just better than us. At that point do we just mostly lose interest? You can see it already in the Go-playing community. Why study for years to be a "pretty good" 9P when machines will always play better? What PhD student will spend years on a hard, interesting, deep math problem? Sure, a few will. Some people become monks too. But not many.
[−] pbw 40d ago
There's certainly a risk that an individual will rely too much on AI, to the detriment of their ability to understand things. However, I think there are obvious counter-measures. For example, requiring that the student can explain every single intermediate step and every single figure in detail.

A two-hour thesis defense isn't enough to uncover this, but a 40-hour deep probing examination by an AI might be. And the thesis committee gets a "highlight reel" of all the places the student fell short.

The general pattern is: "Suppose we change nothing but add extensive use of AI, look how everything falls apart." When in reality, science and education are complex adaptive systems that will change as much as needed to absorb the impact of AI.