[German]Does Copilot, the AI solution launched by Microsoft on GitHub for embedding code snippets (e.g., in Visual Studio code), violate fair use and the rights of code developers? The nonprofit Free Software Foundation has just raised some questions about the fairness, legitimacy and legality of the AI-driven coding assistant CoPilot.
Advertising
What is GitHub CoPilot?
GitHub Copilot is an AI tool developed by GitHub and OpenAI to help Visual Studio Code users auto-complete code. This service uses machine learning to suggest code snippets to developers as they write software.
GitHub CoPilot was announced by GitHub on June 29, 2021. The artificial intelligence uses a modified version of GPT-3, a language model originally developed to produce human-like text. In Copilot, however, the model has been programmed to produce valid computer code. GitHub Copilot is trained on public GitHub repositories of any license.
Many open questions
The Wikipedia article linked above already indicates that the approach is problematic. This is because the code snippets proposed in Visual Studio Code could infringe on the copyright of other authors. Wikipedia states, "Although most of the code issued by Copilot can be classified as a transformative work, GitHub admits that a small portion of the proposed code is copied verbatim. This has led to concerns that the code issued is not sufficiently transformative to be classified as fair use and could infringe on the copyright of the original owner.
This puts Copilot on legally untested ground, although GitHub states that "training machine learning models on publicly available data is considered fair use in the machine learning community." The Free Software Foundation picked up on the whole thing in this article.
The Free Software Foundation (FSF) strongly discourages free software developers from hosting their code on GitHub. But many developers host on GitHub anyway, and even developers who don't do so have their work mirrored on GitHub by others. The FSF writes that they already know that Copilot in its current form is unacceptable (from their own perspective) and unfair. This is because the service itself uses closed software such as Visual Studio or parts of Visual Studio Code. But this contradicts the approach to using many free software packages.
Advertising
The Free Software Foundation has received numerous inquiries about its position on these issues. From this, it would seem that Copilot's use of freely licensed software has many implications for an incredibly large part of the free software community.
- Developers want to know if training a neural network on their software can really be considered fair use.
- Others interested in using Copilot wonder if the code snippets and other elements copied from GitHub-hosted repositories could lead to copyright infringement.
Even if everything were legally sound, activists wonder if it isn't fundamentally unfair for a proprietary software company like Microsoft to build a Copilot service based on free software on GitHub. To all these questions, many of which have legal implications, there has yet to be a judicial review. To get the answers the free software developer community needs and to identify the best ways to protect user freedom in this area, the FSF has launched a call for white papers on Copilot, copyright, machine learning, and free software. White papers can be submitted on the following issues:
- Is Copilot's training on public repositories infringing copyright? Is it fair use?
- How likely is the output of Copilot to generate actionable claims of violations on GPL-licensed works?
- How can developers ensure that any code to which they hold the copyright is protected against violations generated by Copilot?
- Is there a way for developers using Copilot to comply with free software licenses like the GPL?
- If Copilot learns from AGPL-covered code, is Copilot infringing the AGPL?
- If Copilot generates code which does give rise to a violation of a free software licensed work, how can this violation be discovered by the copyright holder on the underlying work?
- Is a trained artificial intelligence (AI) / machine learning (ML) model resulting from machine learning a compiled version of the training data, or is it something else, like source code that users can modify by doing further training?
- Is the Copilot trained AI/ML model copyrighted? If so, who holds that copyright?
- Should ethical advocacy organizations like the FSF argue for change in copyright law relevant to these questions?
The FSF later plans to publish the contributions that help to clarify the problem and to offer a reward of $500. Articles on the topic can be found at MSPU and at InfoWorld. At Google it used to be "don't be evil", at Microsoft it looks to me like you just try what you can get away with – or how do you see it? As a developer who uses Visual Studio, you are in any case moving on legally mined terrain with CoPilot.
Advertising