Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META/Proposal] OpenSearch project should Adopt Apache Foundation's stance on the use of genAI #182

Open
stephen-crawford opened this issue Dec 5, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@stephen-crawford
Copy link

stephen-crawford commented Dec 5, 2023

Tl;DR: GenAI use is already common across many opensource projects and websites. We (OpenSearch) should acknowledge the value of these tools but also enforce responsible use. I propose the adoption of the Apache Foundation's Generative AI Guidance (https://www.apache.org/legal/generative-tooling.html) or something similar.

Describe the solution you'd like
We should add a new line to the sign-off that says you attest you reported any use of generative AI tooling. To report the use we can just add a new checkbox alongside the several we already have on pull requests.

I think this solution gives the project leeway in addressing generative AI use on a case by case basis while also getting in front of some of the more problematic practices genAI has led to. There has been discussion around examples of PRs from various projects where documentation and/or code has been run through gen AI tooling and then PRs opened to make these changes.

These contributions have been referred to as "low effort" or "low value" in context of those changes and demonstrated the need for some guidelines.

OpenSearch is millions of lines of code and changes to improve our readability of efficiency are often useful. However, there should be expectations about how and when genAI use is appropriate. I think having this evident on PRs will make it easier for maintainers to be judicious in reviewing pull requests.

Describe alternatives you've considered
Leaving things as is; banning genAI use; no restrictions whatsoever.

Additional context
I can share example PRs of what I am referring to, but please reach out to me directly on the public slack as I would prefer not to publicly use anyone's work as a "bad" example.

@stephen-crawford stephen-crawford added enhancement New feature or request untriaged labels Dec 5, 2023
@stephen-crawford stephen-crawford changed the title [META/Proposal] OpenSearch Project Should Adopt Apache Foundation's stance on the use of GenAI [META/Proposal] OpenSearch project should Adopt Apache Foundation's stance on the use of genAI Dec 5, 2023
@dblock
Copy link
Member

dblock commented Dec 7, 2023

@scrawfor99

  1. Maybe copy paste some anonymized/generic screenshots?
  2. I am moving this issue to .github. That's where any project-wide policy would go.

@dblock dblock transferred this issue from opensearch-project/OpenSearch Dec 7, 2023
@dbwiddis
Copy link
Member

dbwiddis commented Jul 7, 2024

@scrawfor99 Thank you for opening this issue. This is definitely a subject worth discussing.

As for the specific recommendation... like everything with GenAI these days, it's complicated.

With respect to the ASF-specific language, I want to exercise a bit of caution. I respect the ASF immensely, and trust their analysis of issues like this. However, their policies tend to err very strictly, and unless we intend to become an ASF project, incur many restrictions borne out of caution more than necessity.

As most SDEs these days, I recognize the benefit that GenAI has brought to us. I've learned new things and improved my own code based on those learnings. But it's really hard to put in writing exactly what that means. I've asked an GenAI site to write a test case for some code and reviewed its response, excerpted relevant parts of it, and implemented it myself. It's certainly not perfect, but it does better in many cases than I would do, and I think a very detailed human review of the AI result is a (very) happy medium. But it's a compromise that defies codification in any written rules.

On the other side of the argument, I have contributed hundreds of answers on Stack Overflow under CC-BY-SA license which (among other requirements) requires attribution. Such contributions are forbidden from ASF code for good reason. And yet, the same GenAI that has helped me write better test cases also will answer questions in the narrow domain that I've contributed to with word-for-word responses that I wrote, without attributing me. GenAI is a legal landmine.

All this is to say I agree that we need a policy, but I am not sure I want to adopt ASF-specific language, but I do want to make it clear that authors need to write their own code themselves, even if they learned something by getting an answer from AI.

That's hard to do. I understand why the most strict restrictions are in force when legal requirements are at stake. I just want to hold back from the "MUST" and keep the "SHOULD" here, as this is an impossible-to-really-enforce requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants