AI reminds of listening to any person who seems like an intellectual authority on multiple subjects on YouTube and is not afraid to wax confidently on any topic. They seem very intelligent and knowledgable until they actually talk about something you know.
In other words, I try to learn from it whenever it does something I can't do but when it does something I can do or something I'm really good at it I find myself wanting to correct it cause it doesn't do it that well.
It just seems like a really quick thinking and fast executing but, ultimately, mid skilled / novice person.
I think the even worse problem is that by extension now *everyone* sounds like an expert, even if they aren’t.
In the past when someone wrote an RFC they would need to study to formulate it well. Now anyone can create content sounding like an expert and it becomes difficult for the reader to differentiate real expertise and depth vs shallow fancy words.
In the last few years I have come to realize that first impression of anything is extremely important if your first few uses were good and wowed you you will be positive about it. If it was not you will be negative about it the bias of the first encounter stays with us no matter what.
Most long-term gamblers will tell you that the first games they played, they won. This is a real thing, yet we cannot apply it by making one bet and then stopping, because so are the probabilities being fair and un-biased.
What squares these two things is that most of the people who played and lost their first games, did not get addicted to gambling.
Dont know about that, I dont think I had any superb first experience with it but even if I had, I got more turned on to it when I started using it for toy program/code solutions I needed on a one-off basis occasionally. If it didnt give me the code I needed to get various things done on the fly, I would maybe be more agnostic.
On non-code stuff, I think its improved or there are better options for making it get to the point and be concise more and I find when I correct it, quite often we actually get somewhere. The answers I remember from my initial use of it ofbasically how to do anything or most subjects was practically a 10 pager with some weird action plan that you were never gonna go thru.
You can sometime run a quick second check by taking the AI's claim and asking it for an evaluation within a fresh context. It won't be misled by the surrounding text and its answer will at least be somewhat unbiased (though it might still be quite wrong).
It helps if you phrase the question openly, not obviously fishing for a yes-or-no answer. Or, if you have to ask for a yes-or-no question, make it sound like you're obviously expecting the answer that's actually less likely, so the AI will (1) either be more willing to argue against it, or (2) provide good arguments for it you might not have considered, because it "knows" the answer is unexpected and it wants to flatter your judgment.
> It helps if you phrase the question openly, not obviously fishing for a yes-or-no answer. Or, if you have to ask for a yes-or-no question, make it sound like you're obviously expecting the answer that's actually less likely,
I do this all the time and hate that I have to do it, with the additional "do not yes-man me, be critical."
> In other words, I try to learn from it whenever it does something I can't do...
So you know it can be full of sh1t on all kinds of topics, and you start learning from it the moment it's 'talking' about subjects you know you don't know about? To me that sounds like the moment to stop, not the moment to start. Or am I missing something?
It's a good analogy to comfort yourself. But remember AI is now being deployed on the frontline of mathematics and coming up with new theories.
The reality is much more stark then your description. Yes, in MANY instances it fails at things you know and you're an expert at. But in MANY instances it also beats you at what you're good at.
People who say stuff like the parent poster are completely mischaracterizing the current situation. We are not in a place where AI is "good" but we are "better". No... we are approaching a place of we are good and AI is starting to beat us at our own game. That is the prominent topic that is what is trending and that is the impending reality.
Yet everywhere on HN I see stuff like, oh AI fails here, or AI fails there. Yeah AI failing is obvious. It's been failing for most of my life. What's unique about the last couple years is that it's starting to beat us. Why? Because your typical HNer holds programming as not just a tool, but an identity. Your skill in programming is also a status symbol and when AI attacks your identity, the first thing you do to defend your identity is to bend reality and try to cast to a different conclusion by looking at everything from a different angle.
Have you been actively using paid versions of the flagship models from Ant / OpenAI? I’m just curious if the conclusion was made within the last 6 months or not.
Gell-Mann amnesia. The things it tells you about things you don't know are things that would make a knowledgeable person go "dude, wtf? That's totally wrong."
A major problem with LLM AIs is their core nature is not understood by the vast majority of everyone - developers included. They are an embodiment of literature, and if that confuses you you're probably operating on an incorrect definition of them.
I like to think of them as idiot savants with exponential more savant than your typical fictional idiot savant. They pivot on every word you use, each word in your series activating areas of training knowledge, until your prompt completes and then the LLM is logically located at some biased perspective of the topic you seek (if your wording was not vague and using implied references). Few seem to realize there is no "one topic" for each topic an LLM knows, there are numerous perspectives on every topic. Those perspectives reflect the reason one person/group is using that topic, and their technical seriousness within that topic. How you word your prompts dictates which of these perspectives your ultimate answer is generated.
When people say their use of AI reflects a mid level understanding of whatever they prompted, that is because the prompt is worded with the language used by "mid level understanding persons". If you want the LLM to respond with expert guidance, you have to prompt it using the same language and terms that the expert you want would use. That is how you activate their area of training to generate a response from them.
This goes further when using coding AI. If your code has the coding structure of a mid level developer, that causes a strong preference for mid level developer guidance - because that is relevant to your code structure. It requires a well written prompt using PhD/Professorial terminology in computer science to operate with a mid level code base and then get advice that would improve that code above it's mid level architecture.
> Across studies, participants with higher trust in AI and lower need for cognition and fluid intelligence showed greater surrender to System 3
So the smart get smarter and the dumb get dumber?
Well, not exactly, but at least for now with AI "highly jagged", and unreliable, it pays to know enough to NOT trust it, and indeed be mentally capable enough that you don't need to surrender to it, and can spot the failures.
I think the potential problems come later, when AI is more capable/reliable, and even the intelligentsia perhaps stop questioning it's output, and stop exercising/developing their own reasoning skills. Maybe AI accelerates us towards some version of "Idiocracy" where human intelligence is even less relevant to evolutionary success (i.e. having/supporting lots of kids) than it is today, and gets bred out of the human species? Maybe this is the inevitable trajectory: species gets smarter when they develop language and tool creation, then peak, and get dumber after having created tools that do the thinking for them?
Pre-AI, a long time ago, I used to think/joke we might go in the other direction - evolve into a pulsating brain, eyes, genitalia and vestigial limbs, as mental work took over from physical, but maybe I got that reversed!
Contrary to the general opinion, I feel that AI has IMPROVED my cognitive skills. I find myself discovering solutions to problems I've always struggled with (without asking AI about it, of course). I also find myself becoming much better at thinking on my feet during regular conversations. I believe I'm spending more time deep thinking than ever before because I can leave the boring cognitive stuff to AI, and that's giving my mind tougher workouts and making it stronger; but I could be completely wrong.
In the technophile's future people aren't just getting dumber, not wanting to think or forgetting how - they aren't allowed to think. Maybe about anything. It's too big liability, costs too much to support, moreover detracts from the product. Like Sam A telling those Indian students they aren't worth the energy and water. That's what we're dealing with.
I suggest everyone interested in learning how these theories emerge, and how the social sciences work, to give it a read. Also, it kind of dismantles the whole idea of System 1 and 2, which then I guess would question the theoretical foundations of this paper too.
When humans have an easy way to do something that is almost as good, we choose that easy way. Call it laziness, energy conservation, coddling, etc. The hard thing then becomes hard to do even when the easy thing isn't available, because the cognitive muscle and the discipline atrophy.
Like kids who are never taught to do things for themselves.
The paper puts AI next to System 1 and 2, but those are ways you think. With AI the thinking still happens, you just can't see or control it anymore.
When you googled something and got five contradictory results, that told you the question was hard. A clean AI answer doesn't give you that signal. Coherence looks the same whether the answer is right or wrong.
The failure mode didn't get worse. It got quieter.
The main problem with "System 3" is that it have its own kind of "cognitive biases", like System 1, but those new cognitive biases are designed by marketing, politics, culture and whatever censor or makes visible the original training. Even if the process, the processing and whatever else around was perfect (that is not, i.e. hallucinations)
But, we still have the System 1, and survived and reached this stage because of it, because even a bad guess is better than the slowness of doing things right. It have its problems, but sometimes you must reach a compromise.
This framing points at something important that I think the alignment evaluation literature often misses: the distinction between what a model represents internally and what it does behaviorally. Probing can tell you what's in the representations, and linear probes can be surprisingly accurate. But in experiments I've run on DeepSeek and Qwen models, high probe accuracy for a given behavior doesn't predict whether the model actually routes through that behavior at inference time. The detection layer and the routing layer are architecturally separable, and most evaluation benchmarks are measuring the former while claiming to measure the latter.
I'm not sure if this is saying people were given a task and the option to consult an AI. When they did they were influenced by its answer.
Which is kind of duh? Of course. They have some cool language like calling the AI system 3 and calling taking advice 'cognitive surrender' but I'm not sure how this differs from asking your mate Bob and taking his advice?
I mean... I don't really check calculations made by a computer (e.g. by my own programs) all that often either and I think I'm completely fine :). But I guess the difference is that we kind of know how computers work and that they're generally super accurate and make mistakes incredibly rarely. The "AI" (although I disagree with "I" part) is wrong incredibly often, and I don't think people appreciate that the difference to the "traditional" approach isn't just significant, it's astronomical: LLMs make things up at least 5% of the time, whereas CPUs male mistakes maybe (10^-12)% of time or less. It's 12 orders of magnitude or so.
Can it design and implement a plutonium electric fuel cell with a 24,000 year half life? We have yet to witness it. Can it automate Farming and Agriculture? These are the real questions. #Born-Crusty
Damn. I came up with a hypothetical "System 3" last year! I didn't find AI very helpful in that regard though.
Current status: partially solved.
Problem: System 2 is supposed to be rational, but I found this to be far from the case. Massive unnecessary suffering.
Solution (WIP): Ask: What is the goal? What are my assumptions? Is there anything I am missing?
--
So, I repeatedly found myself getting into lots of trouble due to unquestioned assumptions. System 2 is supposed to be rational, but I found this to be far from the case.
So I tried inventing an "actually rational system" that I could "operate manually", or with a little help. I called it System 3, a system where you use a Thinking Tool to help you think more effectively.
Initial attempt was a "rational LLM prompt", but these mostly devolve into unhelpful nitpicking. (Maybe it's solvable, but I didn't get very far.)
Then I realized, wouldn't you get better results with a bunch of questions on pen and paper? Guided writing exercises?
I'm not sure what's a good way to get yourself "out of a rut" in terms of thinking about a problem. It seems like the longer you've thought about it, the less likely you are to explore beyond the confines of the "known" (i.e. your probably dodgy/incomplete assumptions).
I haven't solved System 3 yet, but a few months later found myself in an even more harrowing situation which could have been avoided if I had a System 3.
The solution turned out to be trivial, but I missed it for weeks... In this case, I had incorrectly named the project, and thus doomed it to limbo. Turns out naming things is just as important in real life as it is in programming!
So I joked "if being pedantic didn't solve the problem, you weren't being pedantic enough." But it's not a joke! It's about clear thinking. (The negative aspect of pedantry is inappropriate communication. But the positive aspect is "seeing the situation clearly", which is obviously the part you want to keep!)
Have been curious what it could look like (and whether it might be an interesting new type of “post” people make) if readers could see the human prompts and pivots and steering of the LLM inline within the final polished AI output.
"Time pressure (Study 2) and per-item incentives and feedback (Study 3) shifted baseline performance but did not eliminate this pattern: when accurate, AI buffered time-pressure costs and amplified incentive gains; when faulty, it consistently reduced accuracy regardless of situational moderators."
126 comments
In other words, I try to learn from it whenever it does something I can't do but when it does something I can do or something I'm really good at it I find myself wanting to correct it cause it doesn't do it that well.
It just seems like a really quick thinking and fast executing but, ultimately, mid skilled / novice person.
Most long-term gamblers will tell you that the first games they played, they won. This is a real thing, yet we cannot apply it by making one bet and then stopping, because so are the probabilities being fair and un-biased.
What squares these two things is that most of the people who played and lost their first games, did not get addicted to gambling.
On non-code stuff, I think its improved or there are better options for making it get to the point and be concise more and I find when I correct it, quite often we actually get somewhere. The answers I remember from my initial use of it ofbasically how to do anything or most subjects was practically a 10 pager with some weird action plan that you were never gonna go thru.
Just yesterday I asked Gemini Pro 3.0 this question:
> Find such colors A and B:
> A and B are both valid sRGB color.
> Interpolating between them in CIELAB space like this
> C_cielab = (A_cielab + B_cielab) / 2
> results in a color C that can't be represented in sRGB
It gave me a correct answer, great!
...and then it proceeded to tell me to use Oklab, claiming it doesn't have this problem because the sRGB gamut is convex in Oklab.
If I didn't know Oklab does have the exact same problem I would have been fooled. It just sounds too reasonable.
It helps if you phrase the question openly, not obviously fishing for a yes-or-no answer. Or, if you have to ask for a yes-or-no question, make it sound like you're obviously expecting the answer that's actually less likely, so the AI will (1) either be more willing to argue against it, or (2) provide good arguments for it you might not have considered, because it "knows" the answer is unexpected and it wants to flatter your judgment.
> It helps if you phrase the question openly, not obviously fishing for a yes-or-no answer. Or, if you have to ask for a yes-or-no question, make it sound like you're obviously expecting the answer that's actually less likely,
I do this all the time and hate that I have to do it, with the additional "do not yes-man me, be critical."
> In other words, I try to learn from it whenever it does something I can't do...
So you know it can be full of sh1t on all kinds of topics, and you start learning from it the moment it's 'talking' about subjects you know you don't know about? To me that sounds like the moment to stop, not the moment to start. Or am I missing something?
The reality is much more stark then your description. Yes, in MANY instances it fails at things you know and you're an expert at. But in MANY instances it also beats you at what you're good at.
People who say stuff like the parent poster are completely mischaracterizing the current situation. We are not in a place where AI is "good" but we are "better". No... we are approaching a place of we are good and AI is starting to beat us at our own game. That is the prominent topic that is what is trending and that is the impending reality.
Yet everywhere on HN I see stuff like, oh AI fails here, or AI fails there. Yeah AI failing is obvious. It's been failing for most of my life. What's unique about the last couple years is that it's starting to beat us. Why? Because your typical HNer holds programming as not just a tool, but an identity. Your skill in programming is also a status symbol and when AI attacks your identity, the first thing you do to defend your identity is to bend reality and try to cast to a different conclusion by looking at everything from a different angle.
Face Reality.
I like to think of them as idiot savants with exponential more savant than your typical fictional idiot savant. They pivot on every word you use, each word in your series activating areas of training knowledge, until your prompt completes and then the LLM is logically located at some biased perspective of the topic you seek (if your wording was not vague and using implied references). Few seem to realize there is no "one topic" for each topic an LLM knows, there are numerous perspectives on every topic. Those perspectives reflect the reason one person/group is using that topic, and their technical seriousness within that topic. How you word your prompts dictates which of these perspectives your ultimate answer is generated.
When people say their use of AI reflects a mid level understanding of whatever they prompted, that is because the prompt is worded with the language used by "mid level understanding persons". If you want the LLM to respond with expert guidance, you have to prompt it using the same language and terms that the expert you want would use. That is how you activate their area of training to generate a response from them.
This goes further when using coding AI. If your code has the coding structure of a mid level developer, that causes a strong preference for mid level developer guidance - because that is relevant to your code structure. It requires a well written prompt using PhD/Professorial terminology in computer science to operate with a mid level code base and then get advice that would improve that code above it's mid level architecture.
> Across studies, participants with higher trust in AI and lower need for cognition and fluid intelligence showed greater surrender to System 3
So the smart get smarter and the dumb get dumber?
Well, not exactly, but at least for now with AI "highly jagged", and unreliable, it pays to know enough to NOT trust it, and indeed be mentally capable enough that you don't need to surrender to it, and can spot the failures.
I think the potential problems come later, when AI is more capable/reliable, and even the intelligentsia perhaps stop questioning it's output, and stop exercising/developing their own reasoning skills. Maybe AI accelerates us towards some version of "Idiocracy" where human intelligence is even less relevant to evolutionary success (i.e. having/supporting lots of kids) than it is today, and gets bred out of the human species? Maybe this is the inevitable trajectory: species gets smarter when they develop language and tool creation, then peak, and get dumber after having created tools that do the thinking for them?
Pre-AI, a long time ago, I used to think/joke we might go in the other direction - evolve into a pulsating brain, eyes, genitalia and vestigial limbs, as mental work took over from physical, but maybe I got that reversed!
I suggest everyone interested in learning how these theories emerge, and how the social sciences work, to give it a read. Also, it kind of dismantles the whole idea of System 1 and 2, which then I guess would question the theoretical foundations of this paper too.
Like kids who are never taught to do things for themselves.
When you googled something and got five contradictory results, that told you the question was hard. A clean AI answer doesn't give you that signal. Coherence looks the same whether the answer is right or wrong.
The failure mode didn't get worse. It got quieter.
But, we still have the System 1, and survived and reached this stage because of it, because even a bad guess is better than the slowness of doing things right. It have its problems, but sometimes you must reach a compromise.
Large parts of the paper score very high probability of being written entirely by AI in gptzero.
I'm not sure if I could trust anything written in it.
Which is kind of duh? Of course. They have some cool language like calling the AI system 3 and calling taking advice 'cognitive surrender' but I'm not sure how this differs from asking your mate Bob and taking his advice?
It's also probably bad about something else important.
Current status: partially solved.
Problem: System 2 is supposed to be rational, but I found this to be far from the case. Massive unnecessary suffering.
Solution (WIP): Ask: What is the goal? What are my assumptions? Is there anything I am missing?
--
So, I repeatedly found myself getting into lots of trouble due to unquestioned assumptions. System 2 is supposed to be rational, but I found this to be far from the case.
So I tried inventing an "actually rational system" that I could "operate manually", or with a little help. I called it System 3, a system where you use a Thinking Tool to help you think more effectively.
Initial attempt was a "rational LLM prompt", but these mostly devolve into unhelpful nitpicking. (Maybe it's solvable, but I didn't get very far.)
Then I realized, wouldn't you get better results with a bunch of questions on pen and paper? Guided writing exercises?
So here are my attempts so far:
reflect.py - https://gist.github.com/a-n-d-a-i/d54bc03b0ceeb06b4cd61ed173...
unstuck.py - https://gist.github.com/a-n-d-a-i/d54bc03b0ceeb06b4cd61ed173...
--
I'm not sure what's a good way to get yourself "out of a rut" in terms of thinking about a problem. It seems like the longer you've thought about it, the less likely you are to explore beyond the confines of the "known" (i.e. your probably dodgy/incomplete assumptions).
I haven't solved System 3 yet, but a few months later found myself in an even more harrowing situation which could have been avoided if I had a System 3.
The solution turned out to be trivial, but I missed it for weeks... In this case, I had incorrectly named the project, and thus doomed it to limbo. Turns out naming things is just as important in real life as it is in programming!
So I joked "if being pedantic didn't solve the problem, you weren't being pedantic enough." But it's not a joke! It's about clear thinking. (The negative aspect of pedantry is inappropriate communication. But the positive aspect is "seeing the situation clearly", which is obviously the part you want to keep!)
I LOLed.