The Open Source Initiative has unveiled its first release candidate for an open-source AI definition, aiming to clarify the complex landscape of open-source applications in the field.
Open Source Initiative Releases First Candidate for Open-Source AI Definition
The Open Source Initiative (OSI), known for its stewardship of the open-source definition, has reached a significant milestone in its quest to define open-source artificial intelligence (AI). After two years of extensive work, the OSI has launched its first release candidate, RC1, for the Open Source AI Definition. This development is poised to bring clarity to the often-complex conversations around open-source applications in the AI field.
The OSI’s newly proposed definition rests on four fundamental freedoms that an AI system must provide to be considered open source. These include the freedom to use the system for any purpose without permission, the ability to study how the system operates, the option to modify the system, and the right to distribute copies of the original or the modified system.
A contentious issue throughout the development process has been the handling of training data. While complete access to all training data would be ideal to ensure full transparency and reproducibility of AI systems, the OSI has opted for a compromise. Recognising the legal and practical challenges in sharing full datasets, the current definition requires only “sufficiently detailed information” about the data used for training AI systems. This stance has sparked debate, with some arguing that without full data disclosure, the AI cannot be truly open-source.
The OSI addresses this by categorising data into four types: open, public, obtainable, and unshareable data. Each type comes with different legal sharing requirements. The difficulty arises because laws that permit the use of data for training often restrict its redistribution to protect copyrights or personal privacy.
Alongside training data, the RC1 release mandates that the complete source code used for AI training and operation be made available under OSI-approved licenses. Additionally, this requirement extends to model parameters and weights, which must be shared under the same open conditions.
Stefano Maffulli, OSI’s Executive Director, underscores the importance of this definitive framework in preventing ‘open washing’—a practice where companies claim openness without adhering to genuine open-source principles. Maffulli revealed that both open-source advocates and corporations have raised concerns about the new definition. Corporations, in particular, view the demand to disclose training processes and datasets as potentially revealing trade secrets, reminiscent of debates from the 1990s when major tech companies resisted releasing source code.
The current release candidate introduces two pivotal features. Firstly, it stipulates that open-source AI code must be sufficiently expressive for recipients to comprehend the machine learning training process. This requirement is crucial for innovation and forking of AI systems, acknowledging the reluctance of some corporations to share their training and data processing code.
Secondly, the RC1 includes provisions allowing creators to impose copyleft terms on AI code, data, and parameters, either separately or as interconnected packages. While such legal frameworks are currently non-existent, the OSI notes their plausibility and relevance to future AI open-source definitions.
The OSI warns that their work isn’t complete yet. They aim to release the final 1.0 version at the All Things Open conference slated for 28 October 2024. While no new features are planned, the OSI will focus on refining the existing definition, addressing any significant flaws, and enhancing documentation. Moreover, they have recognised a need to clarify their stance that sharing is obligatory when data sharing is legally permissible.
As the OSI moves towards finalising the Open Source AI Definition, the broader community watches closely, recognising the implications this has for the future of AI development.
Source: Noah Wire Services