We already have advanced autopilots that can fly commercial airliners. We just don't trust them enough to not have human pilots. I would trust the autopilot more than freaking Claude. We already do, every day.
In aviation there's a saying, "Aviate, Navigate, Communicate" which describes the hierarchy of things to pay attention to while piloting an aircraft.
Autopilot can be thought of better as "auto-aviate". That is to say, if there is already a navigation plan, the aircraft can follow that plan. Simple autopilots just keep the wings level, others can hold an altitude and change heading. More sophisticated ones can change altitude or even fully land the plane.
All of those things, however, require people to manage the "Navigate" part. "Aviate" is a deterministically solved problem, at least in normal flight operations. As you point out we trust autopilots today, including on (nearly) every single commercial flight.
LLMs are a poor alternative to "aviate", but they could be part of a better flight management automation package. The parent article tries to use the LLM to aviate, with predictable results.
If paired with a capable auto-pilot (not the relatively basic one on that C-172), the LLM could figure out how to operate the FMS and take you from post take-off to final approach and aid in situational awareness.
Currently, I don't think there is a commercial solution for GA aircraft that could say, "Ok, I'm 20NM from KVNY, but there are three people ahead of me in the pattern, so I have to do a right 360 before descending and joining downwind on 34L".
Having an LLM propose that course of action and tell the autopilot to execute on it definitely would be an improvement to GA safety.
I think we can trust them to not have human pilots. It is just that having human in loop is very useful in not that rare scenarios. Say airfield has too much wind or fog or another plane has crashed on all runways... Someone needs to make decision what to do next. Or when there is some system failure not thought about.
And well if they are there they might as well fly for practise.
And no. I would not allow LLM in to the loop of making any decision involving actual flying part.
> We just don't trust them enough to not have human pilots.
Much of the value of a human crew is as an implicit dogfooding warranty for the passengers. If it wasn't safe to fly, the pilots wouldn't risk it day after day.
To think of it, it'd be nice if they posted anonymized third-party psych evaluations of the cockpit crew on the wall by the restrooms. The cabin crew would probably appreciate that too.
“Automation can lower the workload in some cases. But in other situations, using automation when it is not appropriate can increase one’s workload. A pilot has to know how to use a level of automation that is appropriate... Whether you’re flying by hand or using technology to help, you’re ultimately flying the airplane with your mind by developing and maintaining an accurate real-time mental model of your reality—the airplane, the environment, and the situation. The question is: How many different levels of technology do you want to place between your brain and the control surfaces?“[0]
—Sully Sullenberger
[0] Sully: My Search for What Really Matters. p. 188
The question of 'can it fly' is clearly a 'yes, given a little bit of effort'. Flying isn't hard, autopilots have been around a long time. It is recognizing and dealing with things you didn't anticipate that is hard. I think it is more interesting to have 99% of flying done with automated systems but have an LLM focus on recognizing unanticipated situations and recovering or mitigating them.
The bit in the middle where it decides to make its control loop be pure P(roportional), presumably dropping the I and D parts, is interesting to me. Seems like a poor choice.
I try to fly about once a week, I’ve never really tried to self analyze what my inputs are for what I do. My hunch is that there’s quite a bit of I(ntegral) damping I do to avoid over correcting, but also quite a bit of D(erivative) adjustments I do, especially on approach, in order to “skate to the puck”. Density going to have to take it up with some flight buddies. OR maybe those with drone software control loop experience can weigh in?
> CRASHED #2, different cause. Plane was stable in a slow descent but between fly.py invocations (~20 sec gap while I logged and computed the next maneuver) there was no active controller. Plane kept descending under its last commanded controls until it hit terrain at 26 ft MSL, 1.7 nm short of the runway. Lesson: never leave the controller idle in flight
"Can I Get Claude to Fly A Plane" isn't the same thing. Interesting though, would be a good test for different models but it relies on the test harness being good enough that a human could also use the same info to achieve the required outcome. e.g. if latency of input/output is too slow then nobody could do it.
AI being able to quickly react to real time video input is the next thing. Computer use right now is painfully slow working off a slow screenshot/command loop.
Claude uses the wrong modality to be a piloting model. Latency is critical, and outputting tokens in the hope they take the action at the right time is kinda bonkers.
You'd want all the data from the plane to be input neurons, and all the actions to be output neurons.
Surely at least part of the issue here is that even an LLM operates in two digit tokens per second, not to mention extra tokens for "thinking/reasoning" mode, while a real autopilot probably has response times in tens of milliseconds. Plus the network latency vs a local LLM.
> main issue seemed to be delay from what it saw with screenshots and api data and changing course.
This is where I think Taalas-style hardware AI may dominate in the future, especially for vehicle/plane autopilot, even it can't update weights. But determinism is actually a good thing.
Friend participating in some sort of simulated glider tournament trained a neural network to fly one some way (don't ask details). I recall rules were changed to ban such, not because of him.
Using Claude sounds overkill and unfit the same time.
Besides the article, I think a big issue for this would be the speed of the input-decision-act loop as it should be pretty fast and Claude would introduce a lot of latency in it.
As most others have pointed out, the goal from here wouldn't be to craft a custom harness so that Claude could technically fly a plane 100x worse than specialist autopilots. Instead, what would be more interesting is if Claude's executive control, response latency, and visual processing capabilities were improved in a task-agnostic way so that as an emergent property Claude became able to fly a plane.
It would still be better just to let autopilots do the work, because the point of the exercise isn't improved avionics. But it would be an honestly posed challenge for LLMs.
try using codex-5.3-spark, it has much faster inference, might be able to keep up. and maybe a specialized different openrouter model for visual parsing.
Lots of people commenting seem to have not read the article. The author didn't hook Claude up directly with the controls, asking it to one-shot a successful flight.
The author tried getting Claude to develop an autopilot script while being able to observe the flight for nearly live feedback. It got three attempts, and did not manage autolanding. (There's a reason real autopilots do that assisted with ground-based aids.)
96 comments
Autopilot can be thought of better as "auto-aviate". That is to say, if there is already a navigation plan, the aircraft can follow that plan. Simple autopilots just keep the wings level, others can hold an altitude and change heading. More sophisticated ones can change altitude or even fully land the plane.
All of those things, however, require people to manage the "Navigate" part. "Aviate" is a deterministically solved problem, at least in normal flight operations. As you point out we trust autopilots today, including on (nearly) every single commercial flight.
LLMs are a poor alternative to "aviate", but they could be part of a better flight management automation package. The parent article tries to use the LLM to aviate, with predictable results.
If paired with a capable auto-pilot (not the relatively basic one on that C-172), the LLM could figure out how to operate the FMS and take you from post take-off to final approach and aid in situational awareness.
Currently, I don't think there is a commercial solution for GA aircraft that could say, "Ok, I'm 20NM from KVNY, but there are three people ahead of me in the pattern, so I have to do a right 360 before descending and joining downwind on 34L".
Having an LLM propose that course of action and tell the autopilot to execute on it definitely would be an improvement to GA safety.
And well if they are there they might as well fly for practise.
And no. I would not allow LLM in to the loop of making any decision involving actual flying part.
> We just don't trust them enough to not have human pilots.
Much of the value of a human crew is as an implicit dogfooding warranty for the passengers. If it wasn't safe to fly, the pilots wouldn't risk it day after day.
To think of it, it'd be nice if they posted anonymized third-party psych evaluations of the cockpit crew on the wall by the restrooms. The cabin crew would probably appreciate that too.
—Sully Sullenberger
[0] Sully: My Search for What Really Matters. p. 188
"spawning 5 subagents"
I try to fly about once a week, I’ve never really tried to self analyze what my inputs are for what I do. My hunch is that there’s quite a bit of I(ntegral) damping I do to avoid over correcting, but also quite a bit of D(erivative) adjustments I do, especially on approach, in order to “skate to the puck”. Density going to have to take it up with some flight buddies. OR maybe those with drone software control loop experience can weigh in?
> CRASHED #2, different cause. Plane was stable in a slow descent but between fly.py invocations (~20 sec gap while I logged and computed the next maneuver) there was no active controller. Plane kept descending under its last commanded controls until it hit terrain at 26 ft MSL, 1.7 nm short of the runway. Lesson: never leave the controller idle in flight
Gold
You'd want all the data from the plane to be input neurons, and all the actions to be output neurons.
"500 Our Servers Are Experiencing High Load"
"500 Our Servers Are Experiencing High Load"
"500 Our Servers Are Experiencing High Load"
Related from December 2025: Garmin Emergency Autoland deployed for the first time
https://www.flightradar24.com/blog/aviation-news/aviation-sa...
> main issue seemed to be delay from what it saw with screenshots and api data and changing course.
This is where I think Taalas-style hardware AI may dominate in the future, especially for vehicle/plane autopilot, even it can't update weights. But determinism is actually a good thing.
Using Claude sounds overkill and unfit the same time.
I wouldn't trust Claude to ride my bike, so I certainly wouldn't board its flight.
It would still be better just to let autopilots do the work, because the point of the exercise isn't improved avionics. But it would be an honestly posed challenge for LLMs.
The author tried getting Claude to develop an autopilot script while being able to observe the flight for nearly live feedback. It got three attempts, and did not manage autolanding. (There's a reason real autopilots do that assisted with ground-based aids.)