I agree this is very sane and boring. What is insane is that they have to state this in the first place.
I am not against AI coding in general. But there are too many people "contributing" AI generated code to open source projects even when they can't understand what's going on in their code just so they can say in their resumes that they contributed to a big open source project once. And when the maintainer call them out they just blame it on the AI coding tools they are using as if they are not opening PRs under their own names. I can't blame any open source maintainer for being at least a little sceptical when it comes to AI generated contributions.
I think them stating this very simple policy should also be read as them explicitly not making a more restrictive policy, as some kernel maintainers were proposing.
From everything I'm seeing in the industry (I'm basically a noncoder choosing to not use AI in the stuff that I make, and privy to the private work experience of coders and creators also in that field because of human social contacts), I feel like I can shed a bit of light.
It looks to me like a more restrictive policy will be flat-out impossible.
Even people I trust are going along with this stuff, akin to CAD replacing drafting. Code is logic as language, and starting with web code and rapidly metastasizing to C++ (due to complexity and the sheer size of the extant codebase, good and bad) the AI has turned slop-coding to a 'solved problem'. If you don't mean to do the best possible thing or a new thing there is no excuse for existing as a coder in the world of AI.
If you do expect to do a new thing or a best thing, in theory you're required to put out the novel information as AI cannot reach it until you've entered it into the corpus of existing code the AI's built on. However, if you're simply recombining existing aspects of the code language in a novel way, that might be more reachable… that's probably where 'AI escape velocity' will come from should it occur.
In practice, everybody I know is relegating the busywork of coding to AI. I don't feel social pressure to do the same but I'm not a coder. I'm something else that produces MIT-licensed codebases for accomplishing things that aren't represented in code AS code, rather it's for accomplishing things that are specific and experiential. I write code to make specific noises I'm not hearing elsewhere, and not hearing out of the mainstream of 'sound-making code artifacts'.
Therefore, it's impractical for Linux to take any position forbidding AI-assisted code. People will just lie and claim they did it. Is primitive tab-complete also AI? Where's the line? What about when coding tools uniformly begin to tab-complete with extensive reasoning and code prototyping? I already see this in the JetBrains Rider editor I use for Godot hacking, even though I've turned off everything I can related to AI. It'll still try to tab-complete patterns it thinks it recognizes, rarely with what I intend.
And so the choice is to enforce responsibility. I think this is appropriate because that's where the choices will matter. Additions and alterations will be the responsibility of specific human people, which won't handle everything negative that's happening but will allow for some pressures and expectations that are useful.
I don't think you can be a collaborative software project right now and not deal with this in some way. I get out of it because I'm read-only: I'm writing stuff on a codebase that lives on an antique laptop without internet access that couldn't run AI if it tried. Very likely the only web browsers it can run are similarly unable to handle 2026 web pages, though I've not checked in years. You've only got my word for that, though, and your estimation of my veracity based on how plausible it seems (I code publically on livestreams, and am not at all an impressive coder when I do that). Linux can't do what I do, so it's going to do what Linux does, and this seems the best option.
You can refuse to use AI personally, but why would you not help yourself when you can?
… my dad is 86 and only after I signed him up to Claude could he write Arduino code without a phone call to me after 5 minutes of trying himself. So now, he’s spending 4+ hours at a time focused writing code and building circuits of things he only dreamt about creating for decades.
Unless you’re doing something for the personal love of the craft and sharpening your tools, use every advantage you can get in order to do the job.
But… as above, if you’re doing it for the love of it, sure - hand crafted code does taste better and you know all the ingredients are organic
Nah. I'm only interested in the bits it doesn't know. Why would someone else's regurgitated whatever be what I wanted, why would that be help in any way?
My dad isn’t a programmer but he’s done ho by electronics for his whole life. It’s now helping him reach way beyond what he’s ever been able to do himself. And he’s never been so excited about anything for at least the past 20 years!
That's a dim view, people also contribute to make projects work for their own needs with hopes to share fixes with others. Like if I make a fix to vLLM to make a model load on particular hardware, I can verify functionality (LLM no longer strays off topic) and local plausibility (global scales are being applied to attention layers), but I can't pretend to understand full math of the overall process and will never have enough time to do so. So, I can be upfront about AI assist and then maintainer can choose to double check, or else if they don't have time, I guess I can just post a PR link on model's huggingface page and tell others with same hardware they can try to cherrypick it.
What's missed is that neither contributors nor maintainers are usually paid for their effort and nobody has standing to demand that they do anything they are not doing already. Don't like a messy vibe coded PR but need functionality? Then clean it up yourself and send improved version for review. Or let it be unmerged. But don't assign work to others you don't employ.
On the other hand, companies like NVIDIA should be publicly taken to task for changing their mind about instruction set for every new GPU and then not supporting them properly in popular inference engines, they certainly have enough money to hire people who will learn vLLM inside out and ensure high quality patches.
> I agree this is very sane and boring. What is insane is that they have to state this in the first place.
I don't think it's insane. It seems reasonable that people could disagree about how much attribution and disclosure there should be about AI assistance, or if it's even allowed, etc.
Every document in that process directory explains stuff that could be obvious to some people but not others.
On the other hand, it seriously sucks to spend time learning a big codebase and modifying it with care, only to not be given the time of day when you send the patches to the maintainers. Sometimes the reward for this human labor isn't a sincere peer review of the work and a productive back-and-forth to iron out issues before merging, it's to watch one's work languish unnoticed for a long time only for the maintainer to show up after the fact and write his own fix or implementation while giving you a shout out in the commit message if you're lucky.
Can't really blame people for reducing their level of effort. It's very easy to put in a lot of effort and end up with absolutely nothing to show for it. Before AI came along, my realization was that begging the maintainers to implement the features I wanted was the right move. They have all the context and can do it better than us in a fraction of the time it'd take us to do it. Actually cloning someone else's repository and working on it should only be attempted if one is willing to literally fork it and own the project should things go south. Now that we have AI, it's actually possible to easily understand and modify complex codebases, and I simply cannot find the will to blame people for using it to the fullest extent. Getting the AI to maintain the fork is really easy too.
It cannot be understated how religiously opposed many in the Linux community are to even a single AI assisted commit landing in the kernel no matter how well reviewed.
Plenty see Torvalds as a traitor for this policy and will never contribute again if any clearly labeled AI generated code is actually allowed to merge.
> Signed-Off ...
> The human submitter is responsible for:
> Reviewing all AI-generated code
> Ensuring compliance with licensing requirements
> Adding their own Signed-off-by tag to certify the DCO
> Taking full responsibility for the contribution
> Attribution: ... Contributions should include an Assisted-by tag in the following format:
Responsibility assigned to where it should lie. Expected no less from Torvalds, the progenitor of Linux and Git. No demagoguery, no b*.
I am sure that this was reviewed by attorneys before being published as policy, because of the copyright implications.
Hopefully this will set the trend and provide definitive guidance for a number of Devs that were not only seeing the utility behind ai assistance but also the acrimony from some quarters, causing some fence-sitting.
This does nothing to shield Linux from responsibility for infringing code.
This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.
It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.
How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.
How is one supposed to ensure license compliance while using LLMs which do not (and cannot) attribute sources having contributed to a specific response?
This is the right way forward for open-source. Correct attribution - by tightening the connection between agents and the humans behind them, and putting the onus on the human to vet the agent output. Thank you Linus.
> All contributions must comply with the kernel's licensing requirements:
I just don't think that's realistically achievable. Unless the models themselves can introspect on the code and detect any potential license violations.
If you get hit with a copyright violation in this scheme I'd be afraid that they're going to hammer you for negligence of this obvious issue.
It's a sane policy - human is responsible for what they contribute, regardless of what tools they use in the development process.
However, the gotcha here seems to be that the developer has to say that the code is compatible with the GPL, which seems an impossible ask, since the AI models have presumably been trained on all the code they can find on the internet regardless of licensing, and we know they are capable of "regenerating" (regurgitating) stuff they were trained on with high fidelity.
LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].
The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which
1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies
2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.
I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.
Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.
Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?
The policy makes sense as a liability shield, but it doesn't address the actual problem, which is review bandwidth. A human signs off on AI-generated code they don't fully understand, the patch looks fine, it gets merged. Six months later someone finds a subtle bug in an edge case no reviewer would've caught because the code was "too clean."
Reading this right after the Sashiko endorsement is a bit jarring. Greg KH greenlit an AI reviewer running on every patch a couple weeks back, and that direction actually seems to be helping, while here the conversation is still about whether contributors will take responsibility for AI code they submit. That feels like the harder side to police. The bugs that land kernel teams in trouble are race conditions, locking, lifetimes, the things models are most confidently wrong about.
I have seen agents produce code that compiles cleanly, reads fine on a Friday review, then deadlocks under contention three weeks later. Is this contributor policy supposed to be the long term answer, or a placeholder until something
Sashiko-shaped does the heavy filtering on the maintainer side too?
How do the reviewers feel about this? Hopefully it won't result in them being overwhelmed with PRs. There used to be a kind of "natural limit" to error rates in our code given how much we could produce at once and our risk tolerance for approving changes. Given empirical studies on informal code review which demonstrate how ineffective it is at preventing errors... it seems like we're gearing up to aim a fire-hose of code at people who are ill-prepared to review code at these new volumes.
How long until people get exhausted with the new volume of code review and start "trusting" the LLMs more without sufficient review, I wonder?
I don't envy Linus in his position... hopefully this approach will work out well for the team.
I feel like a lot of people will have an ideological opposition to AI, but that would lead to people sometimes submitting AI generated code with no attribution and just lying about it.
At the same time, I feel bad for all the people that have to deal with low quality AI slop submissions, in any project out there.
The rules for projects that allow AI submissions might as well state: "You need to spend at least ~10 iterations of model X review agents and 10 USD of tokens on reviewing AI changes before they are allowed to be considered for inclusion."
(I realize that sounds insane, but in my experience iterated review even by the same Opus model can help catch bugs in the code, I feel like the next token prediction in of itself is quite error prone alone; in other words, even Opus "writes" code that it has bugs that its own review iterations catch)
We've seen in the past, for instance in the world of compliance, that if companies/governments want something done or make a mistake, they just have a designated person act as scapegoat.
So what's preventing lawyers/companies having a batch of people they use as scapegoats, should something go wrong?
Weird that they're co-opting the "Assisted-by:" trailer to tag software and model being used. This trailer was previously used to tag someone else who has assisted in the commit in some way. Now it has two distinct usages.
I like this. It's an inversion of the old addage, "a poor craftsman blames his tools" and the corollary, "use the right tool for the job" (because a good craftsman chooses the appropriate tool).
You don't get to bang on a screw and blame the hammer.
A phenomenon I can not explain is the fact that this simple clean statement of a fairly obvious approach to AI assistance somehow took this long and Linus to state so cleanly.
Are there other popular repos with effectively this policy stated as neatly that I’ve missed?
How can we automate the disclosure of what AI agent was used in a PR and the extent of code? Would be nice to also have an audit of prompts used, as that could also be considered “code”.
I like this. It's just saying you have responsibility for the tools you wield. It's concise.
Side note, I'm not sure why I feel weird about having the string "Assisted-by: AGENT_NAME:MODEL_VERSION" [TOOL1] [TOOL2] in the kernel docs source :D. Mostly joking. But if the Linux kernel has it now, I guess it's the inflection point for...something.
Having the competence to put together a good patch used to be a proxy that you were motivated to stick around and fix any regressions you caused and that you were worth investing in, as a community member.
Or, to put it another way, in the old days in order to be a 3k-LoC PR wielding psychopath intent on making your colleagues miserable with churny aggro diffs from hell you at least had to be good at coding.
Nowadays, you only need to do the psychopath art — Claude will happily fill in the PR for you.
Honestly kind of surprised they went this route -- just 'you own it, you're responsible for it' is such a clean answer to what feels like an endlessly complicated debate.
Interesting that coccinelle, sparse, smatch & clang-tidy are included, at least as examples. Those aren't AI coding tools in the normal sense, just regular, deterministic static analysis / code generation tools. But fine, I guess.
We've been using Co-Developed-By: for our AI annotations.
436 comments
That's... refreshingly normal? Surely something most people acting in good faith can get behind.
I am not against AI coding in general. But there are too many people "contributing" AI generated code to open source projects even when they can't understand what's going on in their code just so they can say in their resumes that they contributed to a big open source project once. And when the maintainer call them out they just blame it on the AI coding tools they are using as if they are not opening PRs under their own names. I can't blame any open source maintainer for being at least a little sceptical when it comes to AI generated contributions.
It looks to me like a more restrictive policy will be flat-out impossible.
Even people I trust are going along with this stuff, akin to CAD replacing drafting. Code is logic as language, and starting with web code and rapidly metastasizing to C++ (due to complexity and the sheer size of the extant codebase, good and bad) the AI has turned slop-coding to a 'solved problem'. If you don't mean to do the best possible thing or a new thing there is no excuse for existing as a coder in the world of AI.
If you do expect to do a new thing or a best thing, in theory you're required to put out the novel information as AI cannot reach it until you've entered it into the corpus of existing code the AI's built on. However, if you're simply recombining existing aspects of the code language in a novel way, that might be more reachable… that's probably where 'AI escape velocity' will come from should it occur.
In practice, everybody I know is relegating the busywork of coding to AI. I don't feel social pressure to do the same but I'm not a coder. I'm something else that produces MIT-licensed codebases for accomplishing things that aren't represented in code AS code, rather it's for accomplishing things that are specific and experiential. I write code to make specific noises I'm not hearing elsewhere, and not hearing out of the mainstream of 'sound-making code artifacts'.
Therefore, it's impractical for Linux to take any position forbidding AI-assisted code. People will just lie and claim they did it. Is primitive tab-complete also AI? Where's the line? What about when coding tools uniformly begin to tab-complete with extensive reasoning and code prototyping? I already see this in the JetBrains Rider editor I use for Godot hacking, even though I've turned off everything I can related to AI. It'll still try to tab-complete patterns it thinks it recognizes, rarely with what I intend.
And so the choice is to enforce responsibility. I think this is appropriate because that's where the choices will matter. Additions and alterations will be the responsibility of specific human people, which won't handle everything negative that's happening but will allow for some pressures and expectations that are useful.
I don't think you can be a collaborative software project right now and not deal with this in some way. I get out of it because I'm read-only: I'm writing stuff on a codebase that lives on an antique laptop without internet access that couldn't run AI if it tried. Very likely the only web browsers it can run are similarly unable to handle 2026 web pages, though I've not checked in years. You've only got my word for that, though, and your estimation of my veracity based on how plausible it seems (I code publically on livestreams, and am not at all an impressive coder when I do that). Linux can't do what I do, so it's going to do what Linux does, and this seems the best option.
… my dad is 86 and only after I signed him up to Claude could he write Arduino code without a phone call to me after 5 minutes of trying himself. So now, he’s spending 4+ hours at a time focused writing code and building circuits of things he only dreamt about creating for decades.
Unless you’re doing something for the personal love of the craft and sharpening your tools, use every advantage you can get in order to do the job.
But… as above, if you’re doing it for the love of it, sure - hand crafted code does taste better and you know all the ingredients are organic
What's missed is that neither contributors nor maintainers are usually paid for their effort and nobody has standing to demand that they do anything they are not doing already. Don't like a messy vibe coded PR but need functionality? Then clean it up yourself and send improved version for review. Or let it be unmerged. But don't assign work to others you don't employ.
On the other hand, companies like NVIDIA should be publicly taken to task for changing their mind about instruction set for every new GPU and then not supporting them properly in popular inference engines, they certainly have enough money to hire people who will learn vLLM inside out and ensure high quality patches.
> I agree this is very sane and boring. What is insane is that they have to state this in the first place.
I don't think it's insane. It seems reasonable that people could disagree about how much attribution and disclosure there should be about AI assistance, or if it's even allowed, etc.
Every document in that
processdirectory explains stuff that could be obvious to some people but not others.Can't really blame people for reducing their level of effort. It's very easy to put in a lot of effort and end up with absolutely nothing to show for it. Before AI came along, my realization was that begging the maintainers to implement the features I wanted was the right move. They have all the context and can do it better than us in a fraction of the time it'd take us to do it. Actually cloning someone else's repository and working on it should only be attempted if one is willing to literally fork it and own the project should things go south. Now that we have AI, it's actually possible to easily understand and modify complex codebases, and I simply cannot find the will to blame people for using it to the fullest extent. Getting the AI to maintain the fork is really easy too.
Plenty see Torvalds as a traitor for this policy and will never contribute again if any clearly labeled AI generated code is actually allowed to merge.
I am sure that this was reviewed by attorneys before being published as policy, because of the copyright implications.
Hopefully this will set the trend and provide definitive guidance for a number of Devs that were not only seeing the utility behind ai assistance but also the acrimony from some quarters, causing some fence-sitting.
This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.
It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.
> All code must be compatible with GPL-2.0-only
How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.
> All contributions must comply with the kernel's licensing requirements:
I just don't think that's realistically achievable. Unless the models themselves can introspect on the code and detect any potential license violations.
If you get hit with a copyright violation in this scheme I'd be afraid that they're going to hammer you for negligence of this obvious issue.
However, the gotcha here seems to be that the developer has to say that the code is compatible with the GPL, which seems an impossible ask, since the AI models have presumably been trained on all the code they can find on the internet regardless of licensing, and we know they are capable of "regenerating" (regurgitating) stuff they were trained on with high fidelity.
[0] https://youtu.be/mfv0V1SxbNA?si=CBnnesr4nCJLuB9D&t=2003
LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].
The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which
1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies
2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.
I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.
Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.
[0]: https://news.ycombinator.com/item?id=47356000
[1]: http://prize.hutter1.net/
[2]: https://en.wikipedia.org/wiki/ELIZA_effect
[3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...
Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?
How long until people get exhausted with the new volume of code review and start "trusting" the LLMs more without sufficient review, I wonder?
I don't envy Linus in his position... hopefully this approach will work out well for the team.
At the same time, I feel bad for all the people that have to deal with low quality AI slop submissions, in any project out there.
The rules for projects that allow AI submissions might as well state: "You need to spend at least ~10 iterations of model X review agents and 10 USD of tokens on reviewing AI changes before they are allowed to be considered for inclusion."
(I realize that sounds insane, but in my experience iterated review even by the same Opus model can help catch bugs in the code, I feel like the next token prediction in of itself is quite error prone alone; in other words, even Opus "writes" code that it has bugs that its own review iterations catch)
So what's preventing lawyers/companies having a batch of people they use as scapegoats, should something go wrong?
The typical trailer for this is "AI-assistant:".
You don't get to bang on a screw and blame the hammer.
Are there other popular repos with effectively this policy stated as neatly that I’ve missed?
Side note, I'm not sure why I feel weird about having the string "Assisted-by: AGENT_NAME:MODEL_VERSION" [TOOL1] [TOOL2] in the kernel docs source :D. Mostly joking. But if the Linux kernel has it now, I guess it's the inflection point for...something.
Or, to put it another way, in the old days in order to be a 3k-LoC PR wielding psychopath intent on making your colleagues miserable with churny aggro diffs from hell you at least had to be good at coding.
Nowadays, you only need to do the psychopath art — Claude will happily fill in the PR for you.
Humans for humans!
Don't let skynet win!!!
We've been using Co-Developed-By: for our AI annotations.