The AI industry is trying to change the definition of “Open Source AI”
The Open Source Initiative has published (news article here ) its definition of “open source AI,” and it’s bad. It allows for private training data and methods. It allows development to be done privately. From neural network, training data is something the source code—how the model is programmed—the description is abstract.
And it’s confusing; many “open source” AI models—like LLAMA—are open source in name only. But OSI seems to have been co-opted by industry players who want both company secrets and the “open source” label. (Here is another answer to the explanation.)
This is worth fighting for. We need a public AI option, and open source—true open source—is a necessary part of that.
But while open source should mean open source, there are some open source models that need some kind of explanation. There is a huge field of research on privacy-preserving, integrated methods for ML model training and I think that’s a good thing. And OSI has a point here:
Why do you allow the release of certain training data?
Because we want Open Source AI to exist even in areas where data cannot be legally shared, for example medical AI. Laws that allow training on data often restrict the re-sharing of that same data to protect copyright or other interests. Privacy laws also give an individual the right to control their sensitive information such as decisions about their health. Similarly, much of the world’s Indigenous knowledge is protected through practices that are inconsistent with later developed frameworks for rights selection and sharing.
How about calling this “open weights” and not open source?
Posted November 8, 2024 at 7:03 AM • 0 Comments
Bruce Schneier sidebar photo by Joe MacInnis.
Source link