mastodon.gamedev.place is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mastodon server focused on game development and related topics.

Server stats:

5.8K
active users

New development policy: code generated by a large language model or similar technology (e.g. ChatGPT, GitHub Copilot) is presumed to be tainted (i.e. of unclear copyright, not fitting NetBSD's licensing goals) and cannot be committed to NetBSD.

NetBSD.org/developers/commit-g

www.netbsd.orgNetBSD Commit Guidelines

@asmodai There is no way they can verify that tho

@daniel_collin True, but at least having a policy is something that can be fallen back on in dubious cases?

Daniel Collin

@asmodai Still think it would be hard. Sure if you can "backwards" prove that some specific input generates exactly some code that someone commits without any changes then maybe, but usually you don't write code that way.

You implement something and then you change stuff to what you want it to do. In general using LLMs to do algorithms is a bad idea.

But using it for generating boilerplate (i.e repeating code patterns) and test code is very useful and doesn't affect the "real" code.

@daniel_collin Oh absolutely. I meant it more that they can point in any disputes to some policy that they have on it. Rather than having a situation and not having a policy.

There are some telltale signs of LLMs with big enough pieces of code. But many other smaller cases, especially if lightly touched upon, will be very difficult to distinguish from manually written code.

@asmodai @daniel_collin There’s usually no reasonable way of verifying if the submitter actually has the necessary rights to every single piece of manually-written code. But there’s (usually) also a policy stating that they must, and that by submitting the code, they are stating that they do in fact own those rights. It doesn’t completely prevent code of dodgy origin being submitted, but NOT having the policy would surely be much worse.

@pmdj @asmodai @daniel_collin yeah, the point of these policies isn't necessarily to make it so you have zero code of dubious origin in your repo, it's so that when someone says "that code looks like mine!" you can say "but so and so said it was theirs!" and maybe that won't stand in court but it will probably last a first round of DMCA claims/refutations.

@daniel_collin @asmodai IMHO using AI to write tests is utterly backwards.

You write the tests first, to iron out the API from the consumer's perspective. This helps build a clean, inherently helpful API design. Then you implement.

@aaron @asmodai Yes, but you can write an API definition and have the AI generate tests based on that.

Or you may be working in a codebase where there are no tests and you want to add some.

Or that you want to extend existing testing with more tests.

It's not always as clear cut.

@daniel_collin @asmodai I'm sincerely skeptical that writing an API specification detailed enough to have an AI write a usable test suite would take less time than manually writing the tests yourself.

As for the other scenarios you mention - writing tests for existing code, you run into a more subtle but much worse problem. The AI is likely to write tests that prove the code works as it is written - not what it is intended to do.

The point of tests is to prove that the fiddly edge cases and oddball scenarios don't cause unexpected behavior (aka "bugs" or "vulnerabilities".) Understanding the intent of the code, which may or may not be what the existing code actually does, is simply beyond the abilities of what we currently call AI to do. There's no cognition in there, only a probability model. It's "spicy autocomplete" as some say.

@aaron @asmodai I have found it useful at least, but I guess your mileage may vary.

@aaron @daniel_collin @asmodai You should try it. Either way round it saves a massive amount of time. And because there's far less repetitive donkey work you end up writing more tests of higher quality.

Writing tests is coding-by-LLM's sweet spot.

@daniel_collin @asmodai This is one of the sets of rules that every person with commit access has to follow. Becoming a committer is not easy, it requires joining the Foundation and signing various contracts that place the burden of responsibility on the member. It's a fairly reasonable assumption that we should be able to trust our members, and if not they shouldn't be members.

@netbsd @asmodai Sure, my point is that in the end it will be very hard to prove because it's very nuanced.

@daniel_collin @netbsd @asmodai I still think it's nice to have it in writing should someone get caught and claim that using LLMs was never explicitly forbidden. For instance, if one were to submit code that was a copy and paste of another repo, saying "I just used copilot" wouldn't be a defense.

@daniel_collin @asmodai Pretty much any code that is truly a pattern can and should be factored into a function, macro, loop, or automated, use a layer of abstraction, etc.

If you think about code you're writing as boilerplate that could just be automatically typed by a machine, it's not worth writing it.
If you just need any test because it's required, sure, an LLM will make some pointless ones for you.
Actually useful tests? I don't think so, those require thinking, which LLMs can't do.