Vim's lead maintainer has fully lost his goddamn mind

SwooshBakery624 [they/them]@programming.dev · 2 months ago

Vim's lead maintainer has fully lost his goddamn mind

hperrin@lemmy.ca · 2 months ago

I spent literally all day yesterday working on this:

https://sciactive.com/human-contribution-policy/

I’ve started to add it to my projects. Eventually, it will be on all of my projects. I made it so that any project could adopt it, or modify it to their needs. It’s got a thorough and clear definition of what is banned, too, so it should help any argument over pull requests.

Hopefully more projects will outright ban AI generated code (and other AI generated material).

PlutoniumAcid@lemmy.world · 2 months ago

I like this approach, but how can it be enforced? Would you have to read every line and listen to a gut feeling?

hperrin@lemmy.ca · 2 months ago

Basically the best you can do is continue as normal, and if someone submits something that says it is or obviously is AI, point to this policy and reject it. Just having the policy should be a decent deterrent.

hoch@lemmy.world · 2 months ago

It’s okay, we’re just not going to tell you 👍

hperrin@lemmy.ca · 2 months ago

People submitting malicious or deceptive code to open source repositories isn’t a new phenomenon. Just know that if you do it with any name in any way attached to your real name, and anyone finds out, you can kiss your reputation in the software dev community goodbye.

Also, if you don’t admit that it’s AI generated, and it turns out to be copyrighted code, you’ll have a fun time in court trying to defend yourself for copyright infringement by admitting to fraud.

hoch@lemmy.world · 2 months ago

Good luck proving it

Arcadeep@lemmy.world · 2 months ago

You’re a special type of uninformed, aren’t you

Jankatarch@lemmy.world · 2 months ago

Same mindset as “You don’t need a perfect lock to protect your house from thieves, you just need one better than what your neighbors have.”

If a vibecoder sees this they will not bother with obfuscation and simply move onto the next project.

gaiety@lemmy.blahaj.zone · 2 months ago

This is super cool!

Did want to offer one language critique, it’s easy to jump to the word human as the opposite of AI-made, but there are a lot of therians and adjacent entities in the software engineering space. It would be wonderful to find language that is a pro-“human” policy that avoids that word and instead focuses on people of all sorts of identities so as not to be othering.

Sounds strange to some I’m sure, but this has been coming up more and more with coworkers I’ve had across several companies. It’s kind of like moving from “he or she” to “they”, a great example is the writings of beeps a prominent software engineer on the GOV.UK site and its accessibility https://beeps.website/about/nonhuman/

Regardless if any changes are made thanks for reading and your policy writeup, again very cool :D

hperrin@lemmy.ca · edit-2 2 months ago

I would be fine to include more inclusive language, except that I want to be in line with the wording the US Copyright Office uses, as a major goal of this policy is to ensure that every contribution is copyrightable. They specifically use the word human, and go so far as to say that it is only human authorship that can make something copyrightable.

There was a landmark case where a monkey took a selfie, and the courts decided that the picture could not be copyrighted. In the court’s decision, again, it’s specifically “human” authorship that was the requirement for copyright.

The U.S. Copyright Office will register an original work of authorship, provided that the work was created by a human being.

…

Similarly, the Office will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author.

- https://www.copyright.gov/comp3/chap300/ch300-copyrightable-authorship.pdf

In my opinion, “person” would be a better term to use, since the personhood of the author is really what matters, but since this is meant to provide legal protection, I’m pushed toward the term “human”. Also, “person” could be confused with the concept of a “legal person”, which includes corporations. A corporation itself cannot be an author, but it can own copyrights.

Maybe I should add this to a portion near the bottom of the page to provide the reasoning behind sticking to the term, despite the desire to be inclusive.

Bibip@programming.dev · 2 months ago

hi, i have strong feelings about the use of genai but i come at it from a very different direction (story writing). it’s possible for someone to throw together a 300 page story book in an afternoon - in the style of lovecraft if they want, or brandon sanderson, or dan brown (dan brown always sounds the same and so we might not even notice). now, the assumption that i have about said 300 pager is that it will be dogshit, but art is subjective and someone out there has been beside themselves pining for it.

but this has always been true. there have always been people churning out trash hoping to turn a buck. the fact that they can do it faster now doesn’t change that they’re still in the trash market.

so: i keep writing. i know that my projects will be plagiarized by tech companies. i tell myself that my work is “better” than ai slop.

for you, things are different. writing code is a goal-oriented creative endeavor, but the bar for literature is enjoyment, and the bar for code is functionality. with that in mind, i have some questions:

if someone used genai to generate code snippets and they were able to verify the output, what’s the problem? they used an ersatz gnome to save them some typing. if generated code is indistinguishable from human code, how does this policy work?

for code that’s been flagged as ai generated- and let’s assume it’s obvious, they left a bunch of GPT comments all over the place- is the code bad because it’s genai or is it bad because it doesn’t work?

i’m interested to hear your thoughts

hperrin@lemmy.ca · 2 months ago

That’s a very good question, and I appreciate it.

I put a lot of this in the reasoning section of the policy, but basically there are legal, quality, security, and community reasons. Even if the quality and security reasons are solved (as you’re proposing with the “indistinguishable from human code” aspect), there are still legal and community reasons.

Legal

AI generated material is not copyrightable, and therefore licensing restrictions on it cannot be enforced. It’s considered public domain, so putting that code into your code base makes your license much less enforceable.

AI generated material might be too similar to its copyrighted training data, making it actually copyrighted by the original author. We’ve seen OpenAI and Midjourney get sued for regurgitating their training data. It’s not farfetched to think a copyright owner could go after a project for distributing their copyrighted material after an AI regurgitated it.

Community

People have an implicit trust that the maintainers of a project understand the code. When AI generated code is included, that may not be the case, and that implicit trust is broken.

Admittedly, I’ve never seen AI generated code that I couldn’t understand, but it’s reasonable to think that as AI models get bigger and more capable of producing abstract code, their code could become too obscure or abstracted to be sufficiently understood by a project maintainer.

AeonFelis@lemmy.world · 2 months ago

TBH I don’t really mind when LLMs are used for code reviews. My main issue^[1] with coding assistants is that the people using them don’t verify the code they emit thoroughly (that would be too much work. Remember - reading code is harder then writing it) and thus they often push junk into the codebase and blame the AI for the bad quality when it crashes. But with code reviews there is no such risk, because you still have to read and understand the comments and decide on your own how to resolve them.

Some caveats;

It must be disclosed that the comment was generated by AI. Disagreeing with a human reviewer (who’s usually maintainer) and disagreeing with an LLM are very different beasts.
If the submitter disagrees with an AI comment, and the reviewer agrees with the model’s initial criticism - the reviewer^[2] need to defend it themselves, not delegate the argument back to the LLM.

Quality issue - I’m not talking about the ethical issues here. ↩︎
Regular Open Source etiquette applies, of course. The reviewer is always allowed to reject the PR and ask the submitted to kindly fuck off. ↩︎

hayvan@piefed.world · 2 months ago

The devs do have my sympathy, they dedicate their time and energy for these projects and start burning out.
The solution obviously shouldn’t be drowning it on slop. They should be just slowing down. Vim has been an excellent and functional tool for many years now, it doesn’t need more speed.
There are better ways to use LLMs as a productivity tool.

maegul (he/they)@lemmy.ml · 2 months ago

Couldn’t help but notice the casual gendering of Claude to “he” as well.

Someone somewhere made the important observation not long ago that computer assistants tended to be gendered female when more like a secretary (Siri and Alexa) but now that AIs are “intelligent” and powerful … Claude now has to be a male.

Especially weird (and telling?) when it is objectively gender neutral as it’s not human.

xep@discuss.online · 2 months ago

Of all the problems with these things we’re taking issue with the naming?

Ether@aussie.zone · 2 months ago

Oh no! Another issue! I’m a jellyfish and can only respond to a limited number of stimuli at a time because I have not centralised nervous system capable of organising my critiques into diverse and disparate arguments! I can only talk about vanishingly simple problems that are one-dimensional enough for me to tunnel vision on repeating the same talking points, preferably no longer than a dozen syllables total to accomodate not having a long-term memory centre due to my aforementioned lack of a brain 🪼🥺

I am very tired and have gone absolutely overboard on this comment, to the person I’m responding to pls don’t take this personally, more rational, less sleepy me doesn’t want to be a troll. But SERIOUSLY? You’re argument isn’t even “this isn’t a problem”, it’s “I can’t see the value in doing a full deconstruction of this novel ethical scenario and just want to be a sheep saying it’s bad for the reason my favourite shepherd says so, not because of healthy discussion of ALL the pros and cons.” Reminds me of those cringe posts from a couple months ago where people were saying “the epstein files are a distraction! don’t forget about my favourite political issue {insert valid issue}”. I’m going to be a hypocrite for a second bc this long arse comment is 1,000,000x worse than yours, but consider why you’re commenting before you hit post next time.

xep@discuss.online · 2 months ago

That’s a lot of words you’re putting my mouth, and a lot of names you’re calling a stranger on the internet. But you seem like an alright person, so I hope your day gets better.

Ether@aussie.zone · 2 months ago

Thanks, sorry for being an arse. Need to be more careful about what headspace I’m in before writing anything on the internet in the future.

TheTechnician27@lemmy.world · edit-2 2 months ago

Couldn’t help but notice the casual gendering of Claude to “he” as well.

“Claude” is a male given name. If you think it’s actually a problem, blame Anthropic for giving their LLM a gendered name. I’ve never gendered AI assistants, but I’m not going to begrudge people who do when it’s in the name (or in the case of old Siri, the voice, which would later be the default rather than only option).

Women named “Claude” exist, but they’re staggeringly outnumbered by men to a point where most people don’t even know of women named “Claude” – let alone would immediately associate it as masculine.

amino@lemmy.blahaj.zone · 2 months ago

it’s extremely telling however the shift in marketing. i don’t believe giving the coding plagiarism bot a male name is coincidental. most feminists would probably agree. we’ve known for decades that chatbots were given female names because they’re trying to reenact some tradwife fetish and attract a male audience

TheTechnician27@lemmy.world · edit-2 2 months ago

it’s extremely telling however the shift in marketing

And your hypothesis doesn’t fall apart now why, exactly? AI assistants are more secretary-like than they’ve ever been. “Write me an email.” “Proofread my work.” Beyond that, people are using LLMs as substitutes for significant others.

And yet now, Microsoft migrated “Cortana” to “Copilot”, Siri is more gender-neutral than ever, Alexa still exists off massive brand recognition, and other major AI services are called e.g. “ChatGPT”, “Claude”, “DeepSeek”, and “Grok”. Collectively, that’s gender-neutral.

At most, the hypothesis used to be true but isn’t anymore, because you can literally make an LLM act like a tradwife now if you’re so ~~debased~~ inclined, yet the names are broadly neutral. The MIT Press has a good, lengthy article about the history of gender in speech synthesis, as an aside.

unknownuserunknownlocation@kbin.earth · 2 months ago

Let’s not over interpret things here. Siri and Alexa are both mainly voice assistants, or at least started out as such. Studies have been conducted that show people trust female voices more than male voices. So the choice of female voices was obvious, and having female names is nothing surprising.

Also, Siri, Alexa and Cortana were seen as “intelligent” at the time, as well (or were supposed to be seen, depending on who you ask).

maegul (he/they)@lemmy.ml · 2 months ago

Also, Siri, Alexa and Cortana were seen as “intelligent” at the time, as well (or were supposed to be seen, depending on who you ask).

Intelligent for the time, sure, but ever pitched as doing more than a Secretary that never encroaches on or gets involved with your actual job and cognitive skills? Because that’s the divide that’s being enforced: women for the menial dumb tasks and men for the serious, difficult and actually valuable and important stuff.