How Open is “Open” AI?
2025-01-08

How Open is “Open” AI?

Why definitions like OSAID hold power

Last month, the Open Source Initiative (OSI) unveiled its first iteration of an Open Source AI Definition (OSAID). It came as a response to many AI systems being developed and marketed as “open” even though they are somewhat obviously not. There have been endorsements of the OSAID from major players like Mozilla, but they have also come alongside wider critical discourse from the open source community. Some believe that it isn’t the place of the OSI to define “open” AI, and they may do better to focus on software and stay away from AI. While this could make one consider how we define “software” or “AI” in these contexts, let’s not fall down that rabbit hole for now. What is interesting about this debate is how it shows that defining standards - especially open source standards - grants more power than we realize.

Open Standards - The Rules

It is easy to overlook how we interact with many standards (especially open standards) as we scroll through our digital lives. Take a few minutes and look at the open standards that rule our internet and you will realize that there are thousands. Star and Lampland would point out that these standards are stacked on top of each other like nesting dolls. For example, in order to send an email, you must use the Simple Mail Transfer Protocol (SMTP) to format your email, which uses use the ​​American Standard Code for Information Interchange (ASCII) to encode the text in your message, and also relies on the Internet Protocol (IP) to locate the destination mail server. These are maybe 5% of the standards involved in junk mail winding up in your mailbox, but you get the idea - there are a lot. The purpose of standards is to make systems interoperable, forming the shared set of rules that software ecosystems use to accomplish incredibly complex tasks. They are the rules by which everything else must play, and in the open-source community, standards define how the infrastructure is built and licensed.

Open Source - The Infrastructure

If open standards set the rules of the road for how technology should interact, open source software gives it the infrastructure on which to drive. First, it will be useful to delineate closed and open software as they are importantly not the same. Closed software is something you buy - think Microsoft Exchange. You need to buy a license to use it, you can’t see or modify the source code, and you can’t distribute it. These types of software are almost always owned by a business which would like to monetize their product and want to keep their proprietary code a secret. Open source software is not owned or licensed in this way. It is loosely based on four freedoms that form the basis for the definition of open source by the OSI - ability to run the program, view and modify source code, and distribute the package freely. What is incredible about the open source software movement is how ubiquitous it has become. While we can point to end-user apps like Mozilla’s Thunderbird as open source, it is actually the thousands of operating systems, libraries and services that are embedded in almost all technologies (open and closed) that make open source so powerful. This success is attributable to the communities who have developed and defended standards for open source definition, like the OSI. These definitions have produced a robust ecosystem of individuals and corporations maintaining technological infrastructure which are open and usable by everyone. This is all well and good for software, but defining open AI may be a different animal.

Invisible Infrastructure & Invisible Power

Star and Bowker rightly point out that a trick of infrastructure is that it becomes transparent, fading into the background as something we don’t think much about. We are not considering how Gmail is a closed-source client using SMTP to translate ASCII into text and IP addresses into our recipient’s mailbox. Despite this invisibility, a myriad of ethics and values are baked into each standard and software in that chain. These values, and who embedded them, become invisible when we see them as infrastructure. 

So what are we missing with our open AI infrastructure? Open source software, like Thunderbird, is often available under an open source license endorsed by OSI. Software licenses require certain uses which have been defined and shaped (often contentiously) by a community of technology practitioners and have profound impacts on access. Open AI licenses will probably follow this model, and OSAID will lay the groundwork for access. The OSAID only requires a “complete description” of data rather than the data itself in its definition. This is a small but important concession in their definition that will be important as open AI models become licensed, especially as these licenses are validated by organizations like OSI. Whether or not you believe that AI to be a revolutionary force (I am not convinced), it is clear that AI investment is commanding lots of money, and with that money comes influence over definitions and their invisible outcomes. 

Are We Ready to Define Open AI?

The OSAID raises many questions. If open models are released under a license endorsed by OSI, but don’t have an open data set, is it really open? The OSAID definition says yes - but is there a consensus? AI models are notoriously opaque and complex - can we really expect that even releasing weights and parameters would make them legible to outside parties so that they can be safely and reliably modified?

I’m not asking these questions rhetorically - I think their answers are complex and require more time for investigation than has been granted thus far. At present we don’t really know how genAI will work, and how it will be used, which is why I agree that the OSAID was rushed. What we do know is that Meta, OpenAI, Mistral and many others are releasing models and calling them open, and they may well try to release them under licenses and terms that OSI approves. If OSI leaves enough room for these massive organizations to claim disingenuous openness, and then use licensed uptake to solidify their power in the market, it will be a failure to recognize the importance that the OSAID holds.

**

← Back to Archive