Definitely not a big fan of it, but realistically speaking, it’s here to stay. It is wise for them to govern and regulate it rather than outright ban it. Especially with a project as big as this one, people will try. Saying that the responsibility falls on the human is definitely the right move.
any resulting bugs or security flaws firmly onto the shoulders of the human submitting it.
Watch Americans and their companies pull some mad gymnastics on proportioning blame for this
Well yea, it’s the human submitting the code, and using a tool known to be imperfect
Your comment is pretty dumb
At this point it’s 23 on -5 with opinions on that dumb comment sunshine
Because obviously the majority always right.
Maintainers’ only responsibility is to ensure quality and shouldn’t have to check for rogue AI submissions.
Tho I still miss consistent fucking weather so year of the netbsd?
Ensuring you don’t approve garbage, either human or AI generated, is part of quality
Linux kernel being written by Microsoft’s AI.
which is trained on free and open source code
That will definitely not introduce some weird things when it starts feeding on itself.
Microsoft needs to try to ruin Linux somehow, it can’t just hurt windows 11 with AI slop code, it needs to expand it’s efforts to other systems.
I am the c/fuck_ai person but at this point I have made peace we can’t avoid it. I still don’t want it to do artsy stuff (image gen, video gen) and to blindly use it in critical stuff because humans are the ones that should be doing it or have constant oversight. I think the team’s logic is correct here, because there is no way to know if the code is from an LLM or a human unless something there screams LLM or the contributor explicitly mentions it. Mandating the latter seems like a reasonable move for now.
I consider myself to be more pro AI than not, but I’m certainly not a zealot and mostly agree with the take that it shouldn’t be used in artistic pursuits. However, I love using AI to help me create art. It can give great critiques, often good advice on how to improve, and is great for rapid experimentation and prototyping. I actually used it this weekend to see what a D&D mini might look like with different color schemes before painting it. I could have done the same with Gimp, but it would have taken much longer for worse results that was ultimately just for a brain storming session. How do you feel about my AI usage from your perspective? I suppose from an energy conservation perspective, all of it was bad, but I’m more interested in a less trivial take.
Yes the energy consumption is bad. My main gripe about LLM generated art is that it will not be original. It will use its training data from uncredited artworks to generate it. Art usually is made by humans to express something or convey something in a creative way. LLMs fail at that. What LLMs can actually be helpful at is making learning art more accessible to everyone. Art schools or private art classes can be expensive. This lowers the barrier to entry.
As for you using generated Art is that the it might be really beautiful but it will be very difficult to maintain that style and even more difficult to convince that it is your style. The Artist doesn’t get much recognition with LLM generated art. Using it as a critique also seems stupid because LLMs will aways try to give an objective view on it than subjective. Your art won’t trigger an emotion in it and might say it is bad or “do this to make it more understandable” — that’s where you lose as an artist.
My mom likes to paint as a hobby. What she does it searches stuff on Pinterest (which is mostly LLM Generated). She uses it as an inspiration to do it in her own style and maybe give it some spin. She keeps all of it for herself.
I’m a writer. I got paid to write on a few things here and there, but mostly there are just huge barriers for people without connections.
I plan on using AI to turn my writing into a visual animated format for people to consume. I don’t much care about the style of art, I just want my work to be seen. I can’t afford to pay for artists. If I could, I would. But at least, this would give me an opportunity to show my work without some execs saying no a hundred times.
When I look at the art for cartoons in the 70s/80s, there is so much crap animation with mistakes and duplications, you would think it’s “a.i. slop.” I understand that these were done overseas, pumped out quickly so quality control was overlooked for speed… but it wasn’t the animation I was interested in, it was the stories and characters.
I still think original artists will continue to exist. A.I. is just another tool. People will get bored of the same old stuff and want originality. I really hope it’ll make our lives better in the long run, but we’re just in the weird middle stage of A.I. crawling before running.
I can’t afford to pay for artists
You can afford LLMs right now because all of the LLM companies are losing money on it. If they decide they want to make a profit, they will raise their prices significantly. So you still end up in the same situation. You don’t have much control on what an LLM spits out while with doing animation manually, you have total control or at-least sit with an actual animator to make it look how you envision it to be.
I plan on using AI to turn my writing into a visual animated format for people to consume.
What makes you think that people will respond the same way and in the same numbers to LLM generated animation than if it were crafted by an artist? I reckon that it will be much lower. I see it on youtube constantly. I watched a video about a topic, then I got recommended something related to it from a different channel. Guess what? The script and the animation were so damn similar and the shit they were spewing wasn’t even true in the end. Everything that both the channels made was slop. Sure they spit out more content than conventional methods and got a few thousand views each video and made decent money on it. But they aren’t gonna sustain for long if they want audience retention.
Since then I have been more mindful on what video I click on and going to the extent of disabling recommendations and watch history.
I have downloaded my own LLM that can be used on my own computer… So the only cost is electricity since I upgraded my computer before the prices went to shit. Newegg even gave me free RAM with the purchase of a motherboard so I lucked out on that. Storage is not an issue too since I got that back in 2024 knowing Trump would fuck everything up.
And no, people might not respond the same way to my work, but then again I’m not taking any work away from anyone else because then it would not even exist. If you want to fund me and the artist for our work, then okay. Show me the money.
One thing I’ve noticed is that I see many more people complain about slop than slop itself. It’s so annoying at this point that’s it’s making me go in the opposite direction. Hey everyone, slop here… Microsoft slop here… Use Linux Linux Linux. Slop slop slop. Sloppy joes. It’s like candlestick makers complaining to Nikola Tesla.
Another great example of how AI is just wreaking havoc on people’s brains.
- Wants to show an enticing product to execs, doesn’t want to invest in paying an artist
- realizes they have to have connections but doesn’t want to network
- wants recognition of their hard work, hasn’t sought out a community or collaboration but states “show me the money”
AI will fix everything for me! Slop doesn’t exist! (ignores the very article we’re in, any platform algorithm feed, the us president shit posting, all the slop that gets presented here). Go get em Nik, don’t let haters stop your brilliance.
A very extreme takeaway, but okay.
my own LLM that can be used on my own computer
May I ask how many B parameters does it have? Because the paradox over here is:
- if it is weak then you will be getting much much worse results than even the Big Models the corpos have (we don’t even know how much tbh), let alone the quality of an actual artist.
- If you have a respectfully powerful model then your PC might cost thousands of dollars (even by ignoring the price hikes) which eliminates the excuse to pay an actual artist.
The title of the article is extraordinary wrong that makes it click bait.
There is no “yes to copilot”
It is only a formalization of what Linux said before: All AI is fine but a human is ultimately responsible.
" AI agents cannot use the legally binding “Signed-off-by” tag, requiring instead a new “Assisted-by” tag for transparency"
The only mention of copilot was this:
“developers using Copilot or ChatGPT can’t genuinely guarantee the provenance of what they are submitting”
This remains a problem that the new guidelines don’t resolve. Because even using AI as a tool and having a human review it still means the code the LLM output could have come from non GPL sources.
Yeah, that’s also my question. Partially because I am a former-lawyer-turned-software-developer… but, yeah. How are the kernel maintainers supposed to evaluate whether a particular PR contains non-GPL code?
Granted, this was potentially an issue before LLMs too, but nowhere near the scale it will be now.
(In the interests of full disclosure, my legal career had nothing to do with IP law or software licensing - I did public interest law).
They don’t, just like they don’t with human submitted stuff. The point of the Signed-off-by is the author attests they have the rights to submit the code.
Which I’m guessing they cannot attest, if LLMs truly have the 2-10% plagiarism rate that multiple studies seem to claim. It’s an absurd rule, if you ask me. (Not that I would know, I’m not a lawyer.)
Where are you seeing the 2-10% figure?
In my experience code generation is most affected by the local context (i.e. the codebase you are working on). On top of that a lot of code is purely mechanical - code generally has to have a degree of novelty to be protected by copyright.
Imagine how broken it would be otherwise. The first person to write a while loop in any given language would be the owner of it. Anyone else using the same concept would have to write an increasingly convoluted while loop with extra steps.
Anyone else using the same concept would have to write an increasingly convoluted while loop with extra steps.
Sounds like an origin story for recursion.
If it’s flagged as “assisted by <LLM>” then it’s easy to identify where that code came from. If a commercial LLM is trained on proprietary code, that’s on the AI company, not on the developer who used the LLM to write code. Unless they can somehow prove that the developer had access to said proprietary code and was able to personally exploit it.
If AI companies are claiming “fair use,” and it holds up in court, then there’s no way in hell open-source developers should be held accountable when closed-source snippets magically appear in AI-assisted code.
Granted, I am not a lawyer, and this is not legal advice. I think it’s better to avoid using AI-written code in general. At most use it to generate boilerplate, and maybe add a layer to security audits (not as a replacement for what’s already being done).
But if an LLM regurgitates closed-source code from its training data, I just can’t see any way how that would be the developer’s fault…
Pretty convenient.
This is how copyleft code gets laundered into closed source programs.
All part of the plan.
How would they launder it? Just declare it their own property because a few lines of code look similar? When there’s no established connection between the developers and anyone who has access to the closed-source code?
That makes no sense. Please tell me that wouldn’t hold up in court.
I believe what they’re referring to is the training of models on open source code, which is then used to generate closed source code.
The break in connection you mention makes it not legally infringement, but now code derived from open source is closed source.Because of the untested nature of the situation, it’s unclear how it would unfold, likely hinging on how the request was formed.
We have similar precedent with reverse engineering, but the non sentient tool doing it makes it complicated.
That makes sense. I see the problem with that, and I don’t have a good solution for it. It is a divergence of topic though, as we were discussing open-source programmers using LLMs which are potentially trained on closed-source code.
LLMs trained on open-source code is worth its own discussion, but I don’t see how it fits in this thread. The post isn’t about closed-source programmers using LLMs.
Besides, closed-source code developers could’ve been stealing open-source code all along. They don’t really need AI to do that.
Still, training LLMs on open-source code is a questionable practice for that reason, particularly when it comes to training commercial models on GPL code. But it’s probably hard to prove what code was used in their datasets, since it’s closed-source.
I don’t really see it as a divergence from the topic, since it’s the other side of a developer not being responsible for the code the LLM produces, like you were saying.
In any case, it’s not like conversations can’t drift to adjacent topics.Besides, closed-source code developers could’ve been stealing open-source code all along. They don’t really need AI to do that.
Yes, but that’s the point of laundering something. Before if you put foss code in your commercial product a human could be deposed in the lawsuit and make it public and then there’s consequences. Now you can openly do so and point at the LLM.
People don’t launder money so they can spend it, they launder money so they can spend it openly.
Regardless, it wasn’t even my comment, I just understood what they were saying and I’ve already replied way out of proportion to how invested I am in the topic.
Please tell me that wouldn’t hold up in court.
First tell us how much money you have. Then we’ll be able to predict whether the courts will find in your favor or not
Sad but true…
First of all, who is going to discover the closed source use of gpl code and create a lawsuit anyway?
Second, the llm ingests the code, and then spits it back out, with maybe a few changes. That is how it benefits from copyleft code while stripping the license.
Maybe a human could do the same thing, but it would take much longer.
Wait, did you just move the goalposts? I thought the issue we were talking about was open-source developers who use LLM-generated code and unwittingly commit changes that contain allegedly closed-source snippets from the LLM’s training data.
Now you want to talk about LLM training data that uses open-source code, and then closed-source developers commit changes that contain snippets of GPL code? That’s fine. It’s a change of topic, but we can talk about that too.
Just don’t expect what I said before about the previous topic of discussion to apply to the new topic. If we’re talking about something different now, I get to say different things. That’s how it works.
I was responding specifically to this part
But if an LLM regurgitates closed-source code from its training data, I just can’t see any way how that would be the developer’s fault…
showing what would happen when the llm regurgitates open source code into close source projects.
Sorry if you didn’t like that.
Yup.
I would also just point out that this doesnt change the legal exposure to the Linux kernel to infringing submissions from before the advent of LLMs.
AI is here, another tool to use…the correct way. Very reasonable approach from Torvalds.
I don’t have a problem with LLMs as much as the way people use them. My boss has offloaded all of his thinking to LLMs to the point he can’t fix a sentence in a slide deck without using an LLM.
It’s the people that try to use LLMs for things outside their domain of expertise that really cause the problems.
This is a big point. People need to understand that the LLMs are more like a fancy graphing calculator; they are very good and handle multiple things, but its on you to understand why the calculation is meaningful. At a certain point no one wants to see your long division or factorial. We want the results and for students and professionals to focus on the concept.
I get the metaphor but it’s not a great one for AI in mathematics especially. A statistical word generator is not going to perform reliable math and woe to anyone who acts otherwise.
I would call it an autistic sycophantic savant with brain damage. It’s able to perform apparent miraculous feats of memory and creativity but then be unable to tell reality from fiction, to tell if even the simplest response is valid, and likely will lie about it to make itself seem more competent to please you.
If you have a use for an assistant like that, then great. But a calculator - simple and cheap and reliable - it definitely is not.
REEEEEE!!! Kernel now AI SLOP like LUTRIS!!! 11
I guess even smart people can make stupid decisions. Probably financially motivated decisions too.
It’s definitely financially motivated. Linus said himself that AI has been very lucrative for Linux as it has expanded investment from companies that normally wouldn’t give a fuck (he name dropped NVidia specifically) on that one LTT video.
Unlike brilliant people like you who have created nothing one millionth the importance of Linux
Was that necessary?
Yes. Dude who created one of the most useful projects in software history in large part because of pragmatic decision making makes a pragmatic decision and Joe Rando says “Must be in the pockets of big AI!” because he can’t grasp any singular aspect of a complex issue. Can’t even hold in his head a tiny number of things just vomits crap over the internet. That person needs to spend a lot more time reading and thinking and less typing.
You should try taking your own advice, kiddo.
Saying no to code just because it was AI generated is like saying you can’t trust excel to be your bookkeeper. It’s a tool, and the person using the tool being at fault is exactly what happened here.
Some good points, but poor comparison. Excel is deterministic, AI is not.
Yes, you can ALWAYS trust Excel, after configuring it correctly ONCE. You can NEVER trust AI to produce the same output given the same inputs. Excel never hallucinates, AI hallucinates all the time.
You can actually set it up to give the same outputs given the same inputs (temperature = 0). The variability is on purpose
You can, at that will cause the same output on the same input if there is no variation in floating point rounding errors. (True if the same code is running but easy when optimizing to hit a round up/down and if the tokens are very close the output will diverge)
The point the people (or llm arguing against llms) miss is the world is not deterministic, humans are not deterministic (at least in a practical way at the human scale). And if a system is you should indeed not use an llm… Its powere is how it provides answers with messy data… If you need repeatability make a scripts / code ect.
(Note I do think if the output is for human use it’s important a human validate its useful… The llms can help brainstorm, can with some tests manage a surprising amount of code, but if you don’t validate and test the code it will be slop and maybe work for one test but not for a generic user.
You can, at that will cause the same output on the same input if there is no variation in floating point rounding errors. (True if the same code is running but easy when optimizing to hit a round up/down and if the tokens are very close the output will diverge)
There are more aspects to the randomness such as race conditions and intentionally nondeterministic tiebreaking when tokens have the same probability, apparently.
I actually think LLMs are ill suited for the vast majority of things people are currently using them for, and there are obviously the ethical problems with data centers bringing new fossil fuel power sources online, but the technology is interesting in and of itself
There are more aspects to the randomness such as race conditions and intentionally nondeterministic tiebreaking when tokens have the same probability, apparently.
Yeah, in addition to what the commenter above said about floating points and GPU calculations, LLMs are never fully deterministic.
So now you finally admit that LLMs are not truly deterministic and only near-deterministic.
I’ve told you that from the beginning, but you were too smug, to first admit that major LLM provider systems are not deterministic, and then too smug to look up what near-deterministic systems are and do some research, and barking up the wrong tree.
- Floating point math is deterministic.
- Systems don’t have to be programmed with race conditions. That is not a fundamental aspect of an LLM, but a design decision.
- Systems don’t have to be programmed to tie break with random methods. That is not a fundamental aspect of an LLM, but a design decision.
This is not hard stuff to understand, if you understand computing.
Not true. While setting temperature to zero will drastically reduce variation, it is still only a near-deterministic and not fully deterministic system.
You also have to run the model with the input to determine what the output will be, no way to determine it BEFORE running. With a deterministic system, if you know the code you can predict the output with 100% accuracy without ever running it.
You also have to run the model with the input to determine what the output will be, no way to determine it BEFORE running. With a deterministic system, if you know the code you can predict the output with 100% accuracy without ever running it.
This is not the definition of determinism. You are adding qualifications.
I did look it up and I see now there are other factors that aren’t under your control if you’re using a remote system, so I’ll amend my statement to say that you can have deterministic inference systems, but the big ones most people use cannot be configured to be by the user.
Deterministic systems are always predictable, even if you never ran the system. Can you determine the output of an LLM with zero temperature without ever having ran it?
And even disregarding the above, no, they are still NOT deterministic systems, and can still give different results, even if unlikely. The variation is NOT absolute zero when the temperature is set to zero.
Deterministic systems are always predictable, even if you never ran the system. Can you determine the output of an LLM with zero temperature without ever having ran it?
You don’t have to understand a deterministic system for it to be deterministic. You are making that up.
And even disregarding the above, no, they are still NOT deterministic systems
I conceded that setting temperature to 0 for an arbitrary system (including all the remote ones most people are using) does not mean it is deterministic after reading about other factors that influence inference in these systems. That does not mean there are not deterministic implementations of LLM inference, and repeating yourself with NO additional information and using CAPS does NOT make you more CORRECT lol.
Seems like a reasonable approach. Make people be accountable for the code they submit, no matter the tools used.
If the accountability cannot be practically fulfilled, the reasonable policy becomes a ban.
What good is it to say “oh yeah you can submit LLM code, if you agree to be sued for it later instead of us”? I’m not a lawyer and this isn’t legal advice, but sometimes I feel like that’s what the Linux Foundation policy says.
But this was already the case. When someone submitted code to Linux they always had to assume responsibility for the legality of the submitted code, that’s one of the points of mandatory Signed-off-by.
But now, even the person submitting the license-breaching content may be unaware that they are doing that, so the problem is surely worse now that contributors can easily unwittingly be on the wrong side of the law.
That’s their problem. If they are using an LLM and cannot verify the output they shouldn’t be using an LLM
Problem is that broadly most GenAI users don’t take that risk seriously. So far no one can point to a court case where a rights holder successfully sued someone over LLM infringement.
The biggest chance is getty and their case, with very blatantly obvious infringement. They lost in the UK, so that’s not a good sign.
Most GenAI users do not submit code to the Linux kernel project.
No, it’s not a reasonable approach. Make people be the authors of the code they submit is reasonable, because then it can be released under the GPL. AI generated code is public domain.
I suppose there should be no code generators, assemblers, compilers, linkers, or lsp’s then either? Just etching 1’s and 0’s?
Copilot? You mean the AI with terms of service that are in bold and explicit: “for entertainment purposes only”?
Which is why its in the title and not the article? EntertainBait?
Just legal stuff. Making a huge deal of it is dumb
I disagree.
Legal stuff would be Use at your own risk, or answers may not be correct.
This is really strong language.
I suppose GitHub Copilot is meant, which is a different thing.
Different how, isn’t github owned by microsoft ?
Different in that it’s not an AI model, it’s just a tool you can use to run AI models like Claude.
see my reply here
There are like 70 copilots
The hell. How can they expect people to understand ? They plan to sell 100 things under the same name and try to sell it as one big AI when it is hundred of différents things unrelated ?
They’ve never been good at naming things, but they now seem to be going out of their way to try to be the worst with the names of their software. For instance, they named the successor to the already generically named “remote desktop protocol” “windows app”.
This one is funny. Go google windows app commands. They just fucked sysadmins
Ok, so there are 70-81 copilots, github is one of them.
Why is github copilot a different thing in the context of the reply that was being responded to ?
Copilot is the harness, Claude and GPT are the models
Copilot is by far the worst harness of all the major players
I agree. If AI becomes outlawed, it will simply be used without other people knowing about it.
This approach, at least, means that people will label AI-generated code as such.
Ah, the solution that recognizes there’s no way to eliminate AI from the supply chain after it’s already been introduced.
I don’t understand the full picture here, but the person who is submitting AI slop will be held accountable. Never a company.
So if a company is pushing staff to us AI to complete projects faster and their code ends up being AI slop when submitted, only the person working for the company will be held responsible.
I’m not sure what the repercussions are here but hopefully it’s not a large fine. Those fines could add up quick if the person is submitting code all the time and doesn’t know they are messing up.
I’d still be highly sceptical about pull requests with code created by llms. Personally what I noticed is that the author of such pr doesn’t even read the code, and i have to go through all the slop












